Skip to content

Exclude empty clinical attribute values from legacy clinical-data reads#12178

Open
alisman wants to merge 14 commits into
masterfrom
fix/clinical-data-empty-string-na
Open

Exclude empty clinical attribute values from legacy clinical-data reads#12178
alisman wants to merge 14 commits into
masterfrom
fix/clinical-data-empty-string-na

Conversation

@alisman

@alisman alisman commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the group-comparison Clinical tab bar chart rendering an unlabeled legend entry where an "NA" category should appear (visible on staging/v7 but not production/v6).

Note: This branch is based on #12146, so it currently includes that PR's commits as well. Once #12146 merges, this diff reduces to just the clinical-data fix.

Root cause

The v7 ClickHouse import stores blank clinical cells as empty-string ('') rows in clinical_patient/clinical_sample, whereas v6 omitted them entirely. The legacy POST /api/clinical-data/fetch (ClinicalDataControllerClinicalDataMyBatisRepositoryClinicalDataMapper.xml, reading those tables directly) therefore returns a full cartesian — one row per patient/sample × attribute — with '' for missing values.

The comparison frontend (groupComparison/ClinicalData.tsx) synthesizes the "NA" legend category only for samples that have no returned row, so the new empty-string values become a distinct category with a blank legend label.

Confirmed live for comparison session 5ce411c7e4b0ab4137874076 (acc_tcga): staging (v7.0.0) returned 77 patients × 7 attrs = 539 rows with 93 empty values; production (v6.4.4) returned only actual values (446 rows, 0 empty). Both run ClickHouse, so this is a v6→v7 import-behavior difference, not MySQL-vs-ClickHouse.

Fix (mapper-level, option 2)

Filter attr_value != '' in the legacy clinical-data reads via a guarded excludeEmptyValues flag, enabled for:

  • the row-returning reads getSampleClinicalData / getPatientClinicalData (backing fetchClinicalData, fetchAllClinicalDataInStudy, getAllClinicalDataInStudy, getAllClinicalDataOf{Sample,Patient}InStudy), and
  • their count companions getMetaSampleClinicalData / getMetaPatientClinicalData, so the paginated endpoints' totalCount stays consistent with the rows actually returned.

The flag is required because the shared whereSample/wherePatient fragments are also reused by the value-distribution counts and clinical-table queries — and getVisibleSampleInternalIdsForClinicalTable has no clinical_sample in its FROM — so an unconditional filter would alter those (or be a SQL error). Study-view NA counting (columnar clinical_data_derived path) is untouched.

Testing

  • Added an empty-value fixture entity to clickhouse_data_legacy.sqlTCGA-EMPTY-VAL-PT / TCGA-EMPTY-VAL-PT-01 in acc_tcga, each with one non-empty (OTHER_PATIENT_ID / OTHER_SAMPLE_ID) and one empty (FORM_COMPLETION_DATE / DAYS_TO_COLLECTION) attribute.
  • Added regression tests in ClinicalDataMyBatisRepositoryTest covering patient fetch, sample fetch, fetchClinicalData, and meta-count, asserting the empty values are excluded and the non-empty ones returned.
  • Updated PatientMyBatisRepositoryTest global counts (18 → 19) for the added fixture patient.
  • ✅ Verified: all 398 *MyBatisRepositoryTest tests pass against a ClickHouse Testcontainer (JDK 21 / Maven).

Follow-ups (not in this PR)

  • The authoritative fix is in the importer (not this repo): don't load blank clinical cells at all.
  • getPatientClinicalDataDetailedToSample (clinical-data downloads) still returns blanks if full consistency is desired.

🤖 Generated with Claude Code

Copilot AI and others added 13 commits May 26, 2026 20:17
…eplaced endpoints

Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/a916960c-0be3-4617-92c2-275403077c5c

Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
…ller tests

Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/a916960c-0be3-4617-92c2-275403077c5c

Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/3b0086f6-1a07-4e9b-a231-d2ed5bf692e1

Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
The previous ContentCachingRequestWrapper from Spring doesn't replay the
request body after it's been consumed. When column-store controllers moved
from /api/column-store/* to /api/*, the InvolvedCancerStudyExtractorInterceptor
started matching their paths and consuming the body, leaving @RequestBody
with an empty stream.

Replace with CachedBodyRequestWrapper that reads the entire body upfront
into a byte array, then returns a fresh ByteArrayInputStream on every
getInputStream() call — allowing both the interceptor and the controller
to each read the body independently.

Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/f7093d3a-1761-4110-b26f-1ffc78fc299b

Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
The @hidden annotation and import were removed from this controller when
the endpoint was moved from /api/column-store/ to /api/ and made visible
in Swagger. Remove the outdated Javadoc note that still mentioned it.

Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/38d7f7db-68a6-4cb7-924e-f84497177cb0

Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
… controllers

Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/f76c084e-e733-47bf-a853-e96c67639652

Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
…llers

Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/709b574b-26fd-4a29-9ede-8da93a5a9304

Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
…teration Enrichments controllers

Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/03f49ac6-9156-4c53-bdd2-dcbbe27022d9

Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
…te as fallback when @RequestBody is null

Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/080e3d62-7e80-44ff-84f1-97a53747d33b

Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
…lers

- ColumnStoreMutationController: fix dual-@RequestBody bug; change
  interceptedMutationMultipleStudyFilter from @RequestBody to
  @RequestAttribute, use @RequestBody as fallback, add null guard
- ColumnStoreSampleController: add interceptedSampleFilter
  @RequestAttribute fallback, add null guard
- ColumnStoreClinicalDataEnrichmentController: add interceptedGroupFilter
  @RequestAttribute fallback, add null guard
- ColumnarStoreAlterationEnrichmentController: add
  interceptedMolecularProfileCasesGroupFilters and alterationEventTypes
  @RequestAttribute fallbacks, add null guard

Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/0a52189a-241f-4c66-9f5a-019f75667413

Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
The v7 ClickHouse import stores blank clinical cells as empty-string ('')
rows in clinical_patient/clinical_sample, whereas v6 omitted them. As a
result /clinical-data/fetch returned empty-valued rows for missing data,
breaking the group-comparison Clinical bar chart: the frontend labels a
category "NA" only for samples with no returned row, so empty-string
values rendered as an unlabeled legend entry.

Filter attr_value != '' in the legacy fetch/get clinical-data queries via
a guarded excludeEmptyValues flag, scoped so the shared whereSample /
wherePatient fragments still serve the counts/meta/clinical-table queries
(and getVisibleSampleInternalIdsForClinicalTable, which has no
clinical_sample in its FROM) unchanged. Add regression tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 21:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes legacy clinical-data reads returning empty-string ('') clinical values (introduced by the v7 ClickHouse import), which breaks downstream “NA” handling in the group-comparison Clinical tab. In addition (via the #12146 base), it rehomes legacy endpoints under /api/legacy, exposes column-store endpoints under /api, and updates associated tests/specs.

Changes:

  • Exclude empty-string clinical attribute values in legacy MyBatis clinical-data reads via a guarded excludeEmptyValues mapper flag, plus regression tests using GENIE fixtures.
  • Move legacy endpoints to /api/legacy (and hide them from OpenAPI), while moving column-store endpoints from /api/column-store/... to /api/...; update Java tests, e2e specs, and e2e JS tests accordingly.
  • Replace ContentCachingRequestWrapper usage with a custom cached-body request wrapper to allow the interceptor and controllers to read the body independently, and add request-attribute fallbacks in several column-store controllers.

Reviewed changes

Copilot reviewed 46 out of 53 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/api-e2e/specs/genomic-data-bin-counts.json Updates e2e spec URLs from /api/column-store/... to /api/....
test/api-e2e/specs/genie-public.json Updates e2e spec URL for clinical-data-counts endpoint.
test/api-e2e/specs/generic-assay-data-counts.json Updates e2e spec URLs to new non-/column-store paths.
test/api-e2e/specs/generic-assay-data-bin-counts.json Updates e2e spec URLs to new non-/column-store paths.
test/api-e2e/specs/custom-data-counts-filter.json Updates e2e spec URLs for clinical-data-counts.
test/api-e2e/specs/custom-data-bin-counts-filter.json Updates e2e spec URLs for clinical-data-counts.
src/test/java/org/cbioportal/legacy/web/StudyViewControllerTest.java Updates tests to hit /api/legacy/... endpoints.
src/test/java/org/cbioportal/legacy/web/StudyControllerTest.java Updates tests to hit /api/legacy/studies.
src/test/java/org/cbioportal/legacy/web/SampleControllerTest.java Updates tests to hit /api/legacy/... sample endpoints.
src/test/java/org/cbioportal/legacy/web/MutationControllerTest.java Updates tests to hit /api/legacy/mutations/fetch.
src/test/java/org/cbioportal/legacy/web/GenericAssayControllerTest.java Updates tests to hit /api/legacy/... generic assay endpoints.
src/test/java/org/cbioportal/legacy/web/CoExpressionControllerTest.java Updates tests to hit /api/legacy/... co-expression endpoint.
src/test/java/org/cbioportal/legacy/web/ClinicalDataEnrichmentControllerTest.java Updates tests to hit /api/legacy/clinical-data-enrichments/fetch.
src/test/java/org/cbioportal/legacy/web/ClinicalDataControllerTest.java Updates tests to hit /api/legacy/clinical-data/fetch.
src/test/java/org/cbioportal/legacy/web/AlterationEnrichmentControllerTest.java Updates tests to hit /api/legacy/alteration-enrichments/fetch.
src/test/java/org/cbioportal/legacy/persistence/mybatis/ClinicalDataMyBatisRepositoryTest.java Adds regression tests ensuring empty-string clinical values are excluded.
src/test/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreGenericAssayControllerTest.java Updates test URLs for generic assay column-store endpoints.
src/main/resources/org/cbioportal/legacy/persistence/mybatis/ClinicalDataMapper.xml Adds optional excludeEmptyValues WHERE clause filters for patient/sample clinical reads.
src/main/java/org/cbioportal/legacy/web/util/InvolvedCancerStudyExtractorInterceptor.java Adjusts path normalization to account for /api/legacy/....
src/main/java/org/cbioportal/legacy/web/util/ContentCachingRequestWrapperFilter.java Implements an eager cached-body wrapper to allow multiple body reads.
src/main/java/org/cbioportal/legacy/web/StudyViewController.java Moves endpoints under /legacy/... and hides them from OpenAPI.
src/main/java/org/cbioportal/legacy/web/studyview/CustomDataController.java Moves endpoint under /legacy/... and hides it from OpenAPI.
src/main/java/org/cbioportal/legacy/web/StudyController.java Moves study endpoints under /api/legacy/... and hides them from OpenAPI.
src/main/java/org/cbioportal/legacy/web/SampleController.java Moves controller under /api/legacy and hides it from OpenAPI.
src/main/java/org/cbioportal/legacy/web/MutationController.java Moves endpoint under /legacy/... and hides it from OpenAPI.
src/main/java/org/cbioportal/legacy/web/GenericAssayController.java Moves controller under /api/legacy and hides it from OpenAPI.
src/main/java/org/cbioportal/legacy/web/CoExpressionController.java Moves controller under /api/legacy and hides it from OpenAPI.
src/main/java/org/cbioportal/legacy/web/ClinicalDataEnrichmentController.java Moves controller under /api/legacy and hides it from OpenAPI.
src/main/java/org/cbioportal/legacy/web/ClinicalDataController.java Moves endpoint under /legacy/... and hides it from OpenAPI.
src/main/java/org/cbioportal/legacy/web/AlterationEnrichmentController.java Moves controller under /api/legacy and hides it from OpenAPI.
src/main/java/org/cbioportal/legacy/persistence/mybatis/ClinicalDataMyBatisRepository.java Passes excludeEmptyValues=true to row-returning mapper reads.
src/main/java/org/cbioportal/legacy/persistence/mybatis/ClinicalDataMapper.java Extends mapper method signatures with excludeEmptyValues flag.
src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreStudyController.java Moves column-store study endpoints under /api and updates mappings/annotations.
src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreSampleController.java Moves endpoints under /api and adds request-attribute fallback for filters.
src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreMutationController.java Moves endpoint under /api and adds request-attribute fallback for filters.
src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreGenericAssayController.java Moves endpoints under /api and makes them public in OpenAPI.
src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreCoExpressionController.java Moves endpoint under /api and updates request mapping style.
src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreClinicalDataEnrichmentController.java Moves endpoint under /api and adds request-attribute fallback for filters.
src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreClinicalDataController.java Moves endpoint under /api and adds request-attribute fallback for filters.
src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnarStoreStudyViewController.java Moves endpoints under /api and updates OpenAPI tagging/visibility.
src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnarStoreAlterationEnrichmentController.java Moves endpoint under /api and adds request-attribute fallback for filters.
src/e2e/js/test/ColumnStoreStudyController/ColumnStoreStudyController.spec.ts Updates e2e JS test URLs to /api/....
src/e2e/js/test/ColumnStoreMutationController/ColumnStoreMutationController.spec.ts Updates e2e JS test URL to /api/mutations/fetch.
src/e2e/js/test/ColumnStoreCoExpressionController/ColumnStoreCoExpressionController.spec.ts Updates e2e JS test URL to /api/molecular-profiles/co-expressions/fetch.
src/e2e/js/src/utils.ts Updates helper URLs to /api/....
src/e2e/js/src/types.ts Updates documentation comments for updated endpoint sources.
src/e2e/js/README.md Updates referenced endpoint in docs.
Comments suppressed due to low confidence (1)

src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreClinicalDataController.java:132

  • @PreAuthorize is still evaluating permissions against #clinicalDataMultiStudyFilter, but the handler now treats the request-body filter as optional and falls back to interceptedClinicalDataMultiStudyFilter. If the request body is unavailable (e.g., consumed before Spring binds it), authorization will fail before the fallback can be used, and the filter == null early-return is effectively unreachable. Update the SpEL to authorize against the same effective filter (or against the extracted involvedCancerStudies request attribute) so the fallback path can work as intended.
  @PreAuthorize(
      "hasPermission(#clinicalDataMultiStudyFilter, 'ClinicalDataMultiStudyFilter', T(org.cbioportal.legacy.utils.security.AccessLevel).READ)")
  @RequestMapping(
      method = RequestMethod.POST,
      value = "/clinical-data/fetch",
      consumes = MediaType.APPLICATION_JSON_VALUE,
      produces = MediaType.APPLICATION_JSON_VALUE)
  @Operation(description = "Fetch clinical data by patient IDs or sample IDs (all studies)")
  @ApiResponse(
      responseCode = "200",
      description = "OK",
      content =
          @Content(array = @ArraySchema(schema = @Schema(implementation = ClinicalData.class))))
  public ResponseEntity<List<ClinicalDataDTO>> fetchClinicalData(
      @Parameter(hidden = true)
          @RequestAttribute(required = false, value = "interceptedClinicalDataMultiStudyFilter")
          ClinicalDataMultiStudyFilter interceptedClinicalDataMultiStudyFilter,
      @Parameter(description = "Type of the clinical data") @RequestParam(defaultValue = "SAMPLE")
          ClinicalDataType clinicalDataType,
      @Parameter(
              required = true,
              description = "List of patient or sample identifiers and attribute IDs")
          @Valid
          @RequestBody(required = false)
          ClinicalDataMultiStudyFilter clinicalDataMultiStudyFilter,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 73 to 95
@@ -81,14 +86,21 @@ public ColumnStoreClinicalDataEnrichmentController(
array =
@ArraySchema(schema = @Schema(implementation = ClinicalDataEnrichmentDTO.class))))
public ResponseEntity<List<ClinicalDataEnrichmentDTO>> fetchClinicalEnrichments(
@Parameter(hidden = true)
@RequestAttribute(required = false, value = "interceptedGroupFilter")
GroupFilter interceptedGroupFilter,
@Parameter(required = true, description = "Group filter with sample identifiers")
@Valid
@RequestBody
@RequestBody(required = false)
GroupFilter groupFilter) {
Comment on lines 51 to 82
@PreAuthorize(
"hasPermission(#groupsAndAlterationTypes, 'MolecularProfileCasesGroupAndAlterationTypeFilter', T(org.cbioportal.legacy.utils.security.AccessLevel).READ)")
@RequestMapping(
value = "/alteration-enrichments/fetch",
method = RequestMethod.POST,
consumes = MediaType.APPLICATION_JSON_VALUE,
produces = MediaType.APPLICATION_JSON_VALUE)
@Operation(summary = "Fetch alteration enrichments in molecular profiles")
@ApiResponse(
responseCode = "200",
description = "OK",
content =
@Content(
array = @ArraySchema(schema = @Schema(implementation = AlterationEnrichment.class))))
public ResponseEntity<Collection<AlterationEnrichment>> fetchAlterationEnrichments(
@Parameter(description = "Type of the enrichment e.g. SAMPLE or PATIENT")
@RequestParam(defaultValue = "SAMPLE")
EnrichmentType enrichmentType,
@Parameter(hidden = true)
@RequestAttribute(
required = false,
value = "interceptedMolecularProfileCasesGroupFilters")
List<MolecularProfileCasesGroupFilter> interceptedGroupFilters,
@Parameter(hidden = true) @RequestAttribute(required = false, value = "alterationEventTypes")
AlterationFilter interceptedAlterationEventTypes,
@Parameter(
required = true,
description =
"List of groups containing sample identifiers and list of Alteration Types")
@Valid
@RequestBody(required = false)
MolecularProfileCasesGroupAndAlterationTypeFilter groupsAndAlterationTypes)
Comment on lines 141 to +166
@PreAuthorize(
"hasPermission(#sampleFilter, 'SampleFilter', T(org.cbioportal.legacy.utils.security.AccessLevel).READ)")
@RequestMapping(
value = "/samples/fetch",
method = RequestMethod.POST,
consumes = MediaType.APPLICATION_JSON_VALUE,
produces = MediaType.APPLICATION_JSON_VALUE)
@Operation(description = "Fetch samples by ID")
@ApiResponse(
responseCode = "200",
description = "OK",
content = @Content(array = @ArraySchema(schema = @Schema(implementation = Sample.class))))
public ResponseEntity<List<SampleDTO>> fetchSamples(
@Parameter(hidden = true)
@RequestAttribute(required = false, value = "interceptedSampleFilter")
SampleFilter interceptedSampleFilter,
@Parameter(required = true, description = "List of sample identifiers")
@Valid
@RequestBody(required = false)
SampleFilter sampleFilter,
@Parameter(description = "Level of detail of the response")
@RequestParam(defaultValue = "SUMMARY")
ProjectionType projection) {
SampleFilter effectiveFilter = sampleFilter != null ? sampleFilter : interceptedSampleFilter;
if (effectiveFilter == null) {
return new ResponseEntity<>(Collections.emptyList(), HttpStatus.OK);
Comment on lines 37 to +58
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain filterChain)
throws IOException, ServletException {
ContentCachingRequestWrapper wrappedRequest =
new ContentCachingRequestWrapper((HttpServletRequest) request);
CachedBodyRequestWrapper wrappedRequest =
new CachedBodyRequestWrapper((HttpServletRequest) request);
LOG.trace("Wrapping request for multiple reads of request body");
filterChain.doFilter(wrappedRequest, response);
}

/**
* Request wrapper that eagerly reads and caches the entire request body, then replays it on every
* {@link #getInputStream()} call. This ensures that both the {@link
* InvolvedCancerStudyExtractorInterceptor} (which consumes the body to extract study IDs) and the
* downstream controller (which uses {@code @RequestBody}) can each read the body independently.
*/
static class CachedBodyRequestWrapper extends HttpServletRequestWrapper {
private final byte[] cachedBody;

CachedBodyRequestWrapper(HttpServletRequest request) throws IOException {
super(request);
this.cachedBody = request.getInputStream().readAllBytes();
}
… tests

Extend the excludeEmptyValues filter to getMetaSampleClinicalData /
getMetaPatientClinicalData so the paginated clinical-data endpoints'
totalCount stays consistent with the rows actually returned by the
fetch/get reads (otherwise totalCount would still include the empty rows).

Add an empty-value fixture entity (TCGA-EMPTY-VAL-PT / -01 in acc_tcga) to
clickhouse_data_legacy.sql and regression tests in
ClinicalDataMyBatisRepositoryTest covering patient/sample/fetch reads and
meta counts. Update PatientMyBatisRepositoryTest global counts (18 -> 19)
for the added fixture patient.

Verified: all 398 *MyBatisRepositoryTest tests pass against a ClickHouse
Testcontainer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants