ci: cache ES Docker image and share containers across integration tests by MattDevy · Pull Request #1296 · elastic/go-elasticsearch

MattDevy · 2026-02-26T13:24:05Z

Summary

Cache the ES Docker image in CI using actions/cache. On cache hit the image loads in seconds; on miss it pulls once and saves a tarball for subsequent runs.
Share containers via TestMain: start one secure and one insecure ES container concurrently at the beginning of the test suite, replacing 6 sequential per-test container lifecycles with 2 parallel ones. No test file manages container lifecycle anymore.
Pin all workflow actions to commit SHAs (checkout v6.0.2, setup-go v6.3.0, cache v5.0.3).

Root cause

TestBase64BulkIndexing was the first test to run and triggered a cold, unauthenticated docker pull of the ES SNAPSHOT image via testcontainers — taking ~26 minutes on the GitHub Actions runner. All subsequent tests reused the cached image but each still spun up its own container (~30s startup + ~10s teardown each).

Expected impact

Scenario	Before	After
Cold (no cache)	~32 min	~4-5 min
Warm (cache hit)	~32 min	~2 min

Test plan

Verify the integration workflow passes on this PR
Confirm Docker image cache is populated on first run (docker save)
Confirm subsequent runs load from cache (docker load)
Verify all 6 integration tests still pass with shared containers

Integration tests were taking 32+ minutes because the first test triggered a cold pull of the ES SNAPSHOT image via testcontainers (~26 min), and each of the 6 tests spun up its own container. - Add Docker image caching to the workflow using actions/cache; on cache hit the image loads in seconds instead of pulling for minutes. - Add a shared TestMain that starts one secure and one insecure ES container concurrently, replacing 6 sequential container lifecycles with 2 parallel ones. - Pin all workflow actions to commit SHAs (checkout v6.0.2, setup-go v6.3.0, cache v5.0.3).

prodsecmachine · 2026-02-26T13:24:20Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

The Persistent transport test asserted exact equality of total_opened across 101 requests. With a shared ES container, background activity and transport warm-up cause a small number of extra connections. Check the delta is negligible (<=15) rather than zero.

github-actions · 2026-02-26T13:35:04Z

Super-linter summary

Language	Validation result
GITHUB_ACTIONS	Pass ✅
GITLEAKS	Pass ✅
GIT_MERGE_CONFLICT_MARKERS	Pass ✅
SPELL_CODESPELL	Pass ✅
YAML	Pass ✅
YAML_PRETTIER	Pass ✅

All files and directories linted successfully

For more information, see the GitHub Actions workflow run

Powered by Super-linter

Copilot

Pull request overview

This PR optimizes integration test CI performance by caching the Elasticsearch Docker image and sharing containers across tests via TestMain. The approach replaces per-test container lifecycle management (6 sequential starts/stops taking ~30s each) with 2 parallel container starts at suite initialization, reducing test execution time from ~32 minutes to ~2-5 minutes depending on cache status.

Changes:

Introduced shared test infrastructure using TestMain to start one secure and one insecure ES container concurrently for the entire test suite
Added Docker image caching in GitHub Actions workflow to avoid repeated image pulls
Pinned all workflow actions to specific commit SHAs for security and reproducibility

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
internal/testing/e2e/shared_test.go	New file implementing TestMain with concurrent container startup and shared configuration variables
internal/testing/e2e/json_reader_integration_test.go	Removed per-test container setup, now uses shared configuration
internal/testing/e2e/esapi_integration_test.go	Removed per-test container setup, now uses shared configuration
internal/testing/e2e/elasticsearch_integration_test.go	Removed per-test container setup, improved Persistent test with baseline/delta logic for shared container
internal/testing/e2e/bulk_indexer_integration_test.go	Removed per-test container setup, now uses shared configuration
internal/testing/e2e/base64_bulk_integration_test.go	Removed per-test container setup and external cluster fallback logic, now uses shared configuration
.github/workflows/test-integration.yml	Added ES Docker image caching step and pinned action versions to commit SHAs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-26T13:43:40Z

internal/testing/e2e/shared_test.go

+
+	"github.com/elastic/go-elasticsearch/v9"
+
+	"testing/containertest"


The import path "testing/containertest" is incorrect. Based on the module structure where this file is in internal/testing/e2e/ and the containertest package is in internal/testing/containertest/, the correct import should be just "containertest" (since both packages are under the same testing module as defined in internal/testing/go.mod). This will cause a compilation error.

Suggested change

"testing/containertest"

"containertest"

Copilot · 2026-02-26T13:43:41Z

.github/workflows/test-integration.yml

+        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
+        with:
+          path: /tmp/es-image.tar
+          key: es-image-${{ env.ELASTICSEARCH_VERSION }}-week-${{ github.run_number }}


The cache key includes github.run_number which is unique for every workflow run, defeating the purpose of caching. This will result in a cache miss on every run. The key should use a static value or time-based rotation (like week number) without the run_number. Consider using just es-image-${{ env.ELASTICSEARCH_VERSION }}-week-<static-week-calculation> or remove the -week-${{ github.run_number }} suffix entirely.

Suggested change

key: es-image-${{ env.ELASTICSEARCH_VERSION }}-week-${{ github.run_number }}

key: es-image-${{ env.ELASTICSEARCH_VERSION }}

Copilot · 2026-02-26T13:43:41Z

internal/testing/e2e/elasticsearch_integration_test.go

+			// With persistent connections most of the 101 requests reuse
+			// the same connection. Allow a small delta for transport
+			// warm-up and background ES activity on the shared container.
+			if delta > 15 {


The threshold of 15 new connections over 101 requests seems overly permissive for a persistent connection test. While accounting for shared container activity is reasonable, allowing ~15% new connections may mask genuine connection pool issues. Consider using a tighter threshold (e.g., 5-7) or implementing a more robust isolation mechanism. If tests run concurrently with t.Parallel(), this shared state could lead to flaky test results.

Suggested change

if delta > 15 {

if delta > 7 {

Copilot · 2026-02-26T13:43:41Z

internal/testing/e2e/shared_test.go

+	if err := secureSrv.Terminate(context.Background()); err != nil {
+		errs = append(errs, fmt.Errorf("secure container: %w", err))
+	}
+	if err := insecureSrv.Terminate(context.Background()); err != nil {
+		errs = append(errs, fmt.Errorf("insecure container: %w", err))


The cleanup code attempts to terminate containers even if their initialization failed. If secureSrv or insecureSrv is nil (due to an early error in NewElasticsearchService), calling Terminate() will panic. Add nil checks before calling Terminate() on each service.

Suggested change

if err := secureSrv.Terminate(context.Background()); err != nil {

errs = append(errs, fmt.Errorf("secure container: %w", err))

}

if err := insecureSrv.Terminate(context.Background()); err != nil {

errs = append(errs, fmt.Errorf("insecure container: %w", err))

if secureSrv != nil {

if err := secureSrv.Terminate(context.Background()); err != nil {

errs = append(errs, fmt.Errorf("secure container: %w", err))

}

}

if insecureSrv != nil {

if err := insecureSrv.Terminate(context.Background()); err != nil {

errs = append(errs, fmt.Errorf("insecure container: %w", err))

}

Copilot · 2026-02-26T13:43:42Z

internal/testing/e2e/shared_test.go

+	insecureCfg := insecureSrv.ESConfig()
+	sharedInsecureCfg = insecureCfg
+	sharedInsecureAddr = insecureCfg.Addresses[0]


The local variable insecureCfg on line 84 is unnecessary and creates a redundant intermediate variable. You can directly assign insecureSrv.ESConfig() to sharedInsecureCfg on line 85. Additionally, the sharedInsecureCfg variable is declared but never used anywhere in the codebase - only sharedInsecureAddr is used. Consider removing the unused variable or documenting if it's intentionally kept for future use.

This comment was marked as outdated.

Sign in to view

MattDevy requested a review from Copilot February 26, 2026 13:39

Copilot started reviewing on behalf of MattDevy February 26, 2026 13:40 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: cache ES Docker image and share containers across integration tests#1296

ci: cache ES Docker image and share containers across integration tests#1296
MattDevy wants to merge 2 commits intomainfrom
ci/cache-docker-images

MattDevy commented Feb 26, 2026

Uh oh!

prodsecmachine commented Feb 26, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

github-actions bot commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		"github.com/elastic/go-elasticsearch/v9"

		"testing/containertest"

	key: es-image-${{ env.ELASTICSEARCH_VERSION }}-week-${{ github.run_number }}
	key: es-image-${{ env.ELASTICSEARCH_VERSION }}

Conversation

MattDevy commented Feb 26, 2026

Summary

Root cause

Expected impact

Test plan

Uh oh!

prodsecmachine commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

This comment was marked as outdated.

github-actions bot commented Feb 26, 2026

Super-linter summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

prodsecmachine commented Feb 26, 2026 •

edited

Loading