Exclude empty clinical attribute values from legacy clinical-data reads#12178
Exclude empty clinical attribute values from legacy clinical-data reads#12178alisman wants to merge 14 commits into
Conversation
…eplaced endpoints Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/a916960c-0be3-4617-92c2-275403077c5c Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
…ller tests Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/a916960c-0be3-4617-92c2-275403077c5c Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/3b0086f6-1a07-4e9b-a231-d2ed5bf692e1 Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/c4ca0df8-044f-4598-9c7e-e21a4b2d901d Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
The previous ContentCachingRequestWrapper from Spring doesn't replay the request body after it's been consumed. When column-store controllers moved from /api/column-store/* to /api/*, the InvolvedCancerStudyExtractorInterceptor started matching their paths and consuming the body, leaving @RequestBody with an empty stream. Replace with CachedBodyRequestWrapper that reads the entire body upfront into a byte array, then returns a fresh ByteArrayInputStream on every getInputStream() call — allowing both the interceptor and the controller to each read the body independently. Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/f7093d3a-1761-4110-b26f-1ffc78fc299b Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/f7093d3a-1761-4110-b26f-1ffc78fc299b Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
The @hidden annotation and import were removed from this controller when the endpoint was moved from /api/column-store/ to /api/ and made visible in Swagger. Remove the outdated Javadoc note that still mentioned it. Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/38d7f7db-68a6-4cb7-924e-f84497177cb0 Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
… controllers Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/f76c084e-e733-47bf-a853-e96c67639652 Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
…llers Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/709b574b-26fd-4a29-9ede-8da93a5a9304 Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
…teration Enrichments controllers Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/03f49ac6-9156-4c53-bdd2-dcbbe27022d9 Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
…te as fallback when @RequestBody is null Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/080e3d62-7e80-44ff-84f1-97a53747d33b Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
…lers - ColumnStoreMutationController: fix dual-@RequestBody bug; change interceptedMutationMultipleStudyFilter from @RequestBody to @RequestAttribute, use @RequestBody as fallback, add null guard - ColumnStoreSampleController: add interceptedSampleFilter @RequestAttribute fallback, add null guard - ColumnStoreClinicalDataEnrichmentController: add interceptedGroupFilter @RequestAttribute fallback, add null guard - ColumnarStoreAlterationEnrichmentController: add interceptedMolecularProfileCasesGroupFilters and alterationEventTypes @RequestAttribute fallbacks, add null guard Agent-Logs-Url: https://github.com/cBioPortal/cbioportal/sessions/0a52189a-241f-4c66-9f5a-019f75667413 Co-authored-by: onursumer <3604198+onursumer@users.noreply.github.com>
The v7 ClickHouse import stores blank clinical cells as empty-string ('')
rows in clinical_patient/clinical_sample, whereas v6 omitted them. As a
result /clinical-data/fetch returned empty-valued rows for missing data,
breaking the group-comparison Clinical bar chart: the frontend labels a
category "NA" only for samples with no returned row, so empty-string
values rendered as an unlabeled legend entry.
Filter attr_value != '' in the legacy fetch/get clinical-data queries via
a guarded excludeEmptyValues flag, scoped so the shared whereSample /
wherePatient fragments still serve the counts/meta/clinical-table queries
(and getVisibleSampleInternalIdsForClinicalTable, which has no
clinical_sample in its FROM) unchanged. Add regression tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes legacy clinical-data reads returning empty-string ('') clinical values (introduced by the v7 ClickHouse import), which breaks downstream “NA” handling in the group-comparison Clinical tab. In addition (via the #12146 base), it rehomes legacy endpoints under /api/legacy, exposes column-store endpoints under /api, and updates associated tests/specs.
Changes:
- Exclude empty-string clinical attribute values in legacy MyBatis clinical-data reads via a guarded
excludeEmptyValuesmapper flag, plus regression tests using GENIE fixtures. - Move legacy endpoints to
/api/legacy(and hide them from OpenAPI), while moving column-store endpoints from/api/column-store/...to/api/...; update Java tests, e2e specs, and e2e JS tests accordingly. - Replace
ContentCachingRequestWrapperusage with a custom cached-body request wrapper to allow the interceptor and controllers to read the body independently, and add request-attribute fallbacks in several column-store controllers.
Reviewed changes
Copilot reviewed 46 out of 53 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/api-e2e/specs/genomic-data-bin-counts.json | Updates e2e spec URLs from /api/column-store/... to /api/.... |
| test/api-e2e/specs/genie-public.json | Updates e2e spec URL for clinical-data-counts endpoint. |
| test/api-e2e/specs/generic-assay-data-counts.json | Updates e2e spec URLs to new non-/column-store paths. |
| test/api-e2e/specs/generic-assay-data-bin-counts.json | Updates e2e spec URLs to new non-/column-store paths. |
| test/api-e2e/specs/custom-data-counts-filter.json | Updates e2e spec URLs for clinical-data-counts. |
| test/api-e2e/specs/custom-data-bin-counts-filter.json | Updates e2e spec URLs for clinical-data-counts. |
| src/test/java/org/cbioportal/legacy/web/StudyViewControllerTest.java | Updates tests to hit /api/legacy/... endpoints. |
| src/test/java/org/cbioportal/legacy/web/StudyControllerTest.java | Updates tests to hit /api/legacy/studies. |
| src/test/java/org/cbioportal/legacy/web/SampleControllerTest.java | Updates tests to hit /api/legacy/... sample endpoints. |
| src/test/java/org/cbioportal/legacy/web/MutationControllerTest.java | Updates tests to hit /api/legacy/mutations/fetch. |
| src/test/java/org/cbioportal/legacy/web/GenericAssayControllerTest.java | Updates tests to hit /api/legacy/... generic assay endpoints. |
| src/test/java/org/cbioportal/legacy/web/CoExpressionControllerTest.java | Updates tests to hit /api/legacy/... co-expression endpoint. |
| src/test/java/org/cbioportal/legacy/web/ClinicalDataEnrichmentControllerTest.java | Updates tests to hit /api/legacy/clinical-data-enrichments/fetch. |
| src/test/java/org/cbioportal/legacy/web/ClinicalDataControllerTest.java | Updates tests to hit /api/legacy/clinical-data/fetch. |
| src/test/java/org/cbioportal/legacy/web/AlterationEnrichmentControllerTest.java | Updates tests to hit /api/legacy/alteration-enrichments/fetch. |
| src/test/java/org/cbioportal/legacy/persistence/mybatis/ClinicalDataMyBatisRepositoryTest.java | Adds regression tests ensuring empty-string clinical values are excluded. |
| src/test/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreGenericAssayControllerTest.java | Updates test URLs for generic assay column-store endpoints. |
| src/main/resources/org/cbioportal/legacy/persistence/mybatis/ClinicalDataMapper.xml | Adds optional excludeEmptyValues WHERE clause filters for patient/sample clinical reads. |
| src/main/java/org/cbioportal/legacy/web/util/InvolvedCancerStudyExtractorInterceptor.java | Adjusts path normalization to account for /api/legacy/.... |
| src/main/java/org/cbioportal/legacy/web/util/ContentCachingRequestWrapperFilter.java | Implements an eager cached-body wrapper to allow multiple body reads. |
| src/main/java/org/cbioportal/legacy/web/StudyViewController.java | Moves endpoints under /legacy/... and hides them from OpenAPI. |
| src/main/java/org/cbioportal/legacy/web/studyview/CustomDataController.java | Moves endpoint under /legacy/... and hides it from OpenAPI. |
| src/main/java/org/cbioportal/legacy/web/StudyController.java | Moves study endpoints under /api/legacy/... and hides them from OpenAPI. |
| src/main/java/org/cbioportal/legacy/web/SampleController.java | Moves controller under /api/legacy and hides it from OpenAPI. |
| src/main/java/org/cbioportal/legacy/web/MutationController.java | Moves endpoint under /legacy/... and hides it from OpenAPI. |
| src/main/java/org/cbioportal/legacy/web/GenericAssayController.java | Moves controller under /api/legacy and hides it from OpenAPI. |
| src/main/java/org/cbioportal/legacy/web/CoExpressionController.java | Moves controller under /api/legacy and hides it from OpenAPI. |
| src/main/java/org/cbioportal/legacy/web/ClinicalDataEnrichmentController.java | Moves controller under /api/legacy and hides it from OpenAPI. |
| src/main/java/org/cbioportal/legacy/web/ClinicalDataController.java | Moves endpoint under /legacy/... and hides it from OpenAPI. |
| src/main/java/org/cbioportal/legacy/web/AlterationEnrichmentController.java | Moves controller under /api/legacy and hides it from OpenAPI. |
| src/main/java/org/cbioportal/legacy/persistence/mybatis/ClinicalDataMyBatisRepository.java | Passes excludeEmptyValues=true to row-returning mapper reads. |
| src/main/java/org/cbioportal/legacy/persistence/mybatis/ClinicalDataMapper.java | Extends mapper method signatures with excludeEmptyValues flag. |
| src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreStudyController.java | Moves column-store study endpoints under /api and updates mappings/annotations. |
| src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreSampleController.java | Moves endpoints under /api and adds request-attribute fallback for filters. |
| src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreMutationController.java | Moves endpoint under /api and adds request-attribute fallback for filters. |
| src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreGenericAssayController.java | Moves endpoints under /api and makes them public in OpenAPI. |
| src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreCoExpressionController.java | Moves endpoint under /api and updates request mapping style. |
| src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreClinicalDataEnrichmentController.java | Moves endpoint under /api and adds request-attribute fallback for filters. |
| src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreClinicalDataController.java | Moves endpoint under /api and adds request-attribute fallback for filters. |
| src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnarStoreStudyViewController.java | Moves endpoints under /api and updates OpenAPI tagging/visibility. |
| src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnarStoreAlterationEnrichmentController.java | Moves endpoint under /api and adds request-attribute fallback for filters. |
| src/e2e/js/test/ColumnStoreStudyController/ColumnStoreStudyController.spec.ts | Updates e2e JS test URLs to /api/.... |
| src/e2e/js/test/ColumnStoreMutationController/ColumnStoreMutationController.spec.ts | Updates e2e JS test URL to /api/mutations/fetch. |
| src/e2e/js/test/ColumnStoreCoExpressionController/ColumnStoreCoExpressionController.spec.ts | Updates e2e JS test URL to /api/molecular-profiles/co-expressions/fetch. |
| src/e2e/js/src/utils.ts | Updates helper URLs to /api/.... |
| src/e2e/js/src/types.ts | Updates documentation comments for updated endpoint sources. |
| src/e2e/js/README.md | Updates referenced endpoint in docs. |
Comments suppressed due to low confidence (1)
src/main/java/org/cbioportal/application/rest/vcolumnstore/ColumnStoreClinicalDataController.java:132
@PreAuthorizeis still evaluating permissions against#clinicalDataMultiStudyFilter, but the handler now treats the request-body filter as optional and falls back tointerceptedClinicalDataMultiStudyFilter. If the request body is unavailable (e.g., consumed before Spring binds it), authorization will fail before the fallback can be used, and thefilter == nullearly-return is effectively unreachable. Update the SpEL to authorize against the same effective filter (or against the extractedinvolvedCancerStudiesrequest attribute) so the fallback path can work as intended.
@PreAuthorize(
"hasPermission(#clinicalDataMultiStudyFilter, 'ClinicalDataMultiStudyFilter', T(org.cbioportal.legacy.utils.security.AccessLevel).READ)")
@RequestMapping(
method = RequestMethod.POST,
value = "/clinical-data/fetch",
consumes = MediaType.APPLICATION_JSON_VALUE,
produces = MediaType.APPLICATION_JSON_VALUE)
@Operation(description = "Fetch clinical data by patient IDs or sample IDs (all studies)")
@ApiResponse(
responseCode = "200",
description = "OK",
content =
@Content(array = @ArraySchema(schema = @Schema(implementation = ClinicalData.class))))
public ResponseEntity<List<ClinicalDataDTO>> fetchClinicalData(
@Parameter(hidden = true)
@RequestAttribute(required = false, value = "interceptedClinicalDataMultiStudyFilter")
ClinicalDataMultiStudyFilter interceptedClinicalDataMultiStudyFilter,
@Parameter(description = "Type of the clinical data") @RequestParam(defaultValue = "SAMPLE")
ClinicalDataType clinicalDataType,
@Parameter(
required = true,
description = "List of patient or sample identifiers and attribute IDs")
@Valid
@RequestBody(required = false)
ClinicalDataMultiStudyFilter clinicalDataMultiStudyFilter,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -81,14 +86,21 @@ public ColumnStoreClinicalDataEnrichmentController( | |||
| array = | |||
| @ArraySchema(schema = @Schema(implementation = ClinicalDataEnrichmentDTO.class)))) | |||
| public ResponseEntity<List<ClinicalDataEnrichmentDTO>> fetchClinicalEnrichments( | |||
| @Parameter(hidden = true) | |||
| @RequestAttribute(required = false, value = "interceptedGroupFilter") | |||
| GroupFilter interceptedGroupFilter, | |||
| @Parameter(required = true, description = "Group filter with sample identifiers") | |||
| @Valid | |||
| @RequestBody | |||
| @RequestBody(required = false) | |||
| GroupFilter groupFilter) { | |||
| @PreAuthorize( | ||
| "hasPermission(#groupsAndAlterationTypes, 'MolecularProfileCasesGroupAndAlterationTypeFilter', T(org.cbioportal.legacy.utils.security.AccessLevel).READ)") | ||
| @RequestMapping( | ||
| value = "/alteration-enrichments/fetch", | ||
| method = RequestMethod.POST, | ||
| consumes = MediaType.APPLICATION_JSON_VALUE, | ||
| produces = MediaType.APPLICATION_JSON_VALUE) | ||
| @Operation(summary = "Fetch alteration enrichments in molecular profiles") | ||
| @ApiResponse( | ||
| responseCode = "200", | ||
| description = "OK", | ||
| content = | ||
| @Content( | ||
| array = @ArraySchema(schema = @Schema(implementation = AlterationEnrichment.class)))) | ||
| public ResponseEntity<Collection<AlterationEnrichment>> fetchAlterationEnrichments( | ||
| @Parameter(description = "Type of the enrichment e.g. SAMPLE or PATIENT") | ||
| @RequestParam(defaultValue = "SAMPLE") | ||
| EnrichmentType enrichmentType, | ||
| @Parameter(hidden = true) | ||
| @RequestAttribute( | ||
| required = false, | ||
| value = "interceptedMolecularProfileCasesGroupFilters") | ||
| List<MolecularProfileCasesGroupFilter> interceptedGroupFilters, | ||
| @Parameter(hidden = true) @RequestAttribute(required = false, value = "alterationEventTypes") | ||
| AlterationFilter interceptedAlterationEventTypes, | ||
| @Parameter( | ||
| required = true, | ||
| description = | ||
| "List of groups containing sample identifiers and list of Alteration Types") | ||
| @Valid | ||
| @RequestBody(required = false) | ||
| MolecularProfileCasesGroupAndAlterationTypeFilter groupsAndAlterationTypes) |
| @PreAuthorize( | ||
| "hasPermission(#sampleFilter, 'SampleFilter', T(org.cbioportal.legacy.utils.security.AccessLevel).READ)") | ||
| @RequestMapping( | ||
| value = "/samples/fetch", | ||
| method = RequestMethod.POST, | ||
| consumes = MediaType.APPLICATION_JSON_VALUE, | ||
| produces = MediaType.APPLICATION_JSON_VALUE) | ||
| @Operation(description = "Fetch samples by ID") | ||
| @ApiResponse( | ||
| responseCode = "200", | ||
| description = "OK", | ||
| content = @Content(array = @ArraySchema(schema = @Schema(implementation = Sample.class)))) | ||
| public ResponseEntity<List<SampleDTO>> fetchSamples( | ||
| @Parameter(hidden = true) | ||
| @RequestAttribute(required = false, value = "interceptedSampleFilter") | ||
| SampleFilter interceptedSampleFilter, | ||
| @Parameter(required = true, description = "List of sample identifiers") | ||
| @Valid | ||
| @RequestBody(required = false) | ||
| SampleFilter sampleFilter, | ||
| @Parameter(description = "Level of detail of the response") | ||
| @RequestParam(defaultValue = "SUMMARY") | ||
| ProjectionType projection) { | ||
| SampleFilter effectiveFilter = sampleFilter != null ? sampleFilter : interceptedSampleFilter; | ||
| if (effectiveFilter == null) { | ||
| return new ResponseEntity<>(Collections.emptyList(), HttpStatus.OK); |
| @Override | ||
| public void doFilter(ServletRequest request, ServletResponse response, FilterChain filterChain) | ||
| throws IOException, ServletException { | ||
| ContentCachingRequestWrapper wrappedRequest = | ||
| new ContentCachingRequestWrapper((HttpServletRequest) request); | ||
| CachedBodyRequestWrapper wrappedRequest = | ||
| new CachedBodyRequestWrapper((HttpServletRequest) request); | ||
| LOG.trace("Wrapping request for multiple reads of request body"); | ||
| filterChain.doFilter(wrappedRequest, response); | ||
| } | ||
|
|
||
| /** | ||
| * Request wrapper that eagerly reads and caches the entire request body, then replays it on every | ||
| * {@link #getInputStream()} call. This ensures that both the {@link | ||
| * InvolvedCancerStudyExtractorInterceptor} (which consumes the body to extract study IDs) and the | ||
| * downstream controller (which uses {@code @RequestBody}) can each read the body independently. | ||
| */ | ||
| static class CachedBodyRequestWrapper extends HttpServletRequestWrapper { | ||
| private final byte[] cachedBody; | ||
|
|
||
| CachedBodyRequestWrapper(HttpServletRequest request) throws IOException { | ||
| super(request); | ||
| this.cachedBody = request.getInputStream().readAllBytes(); | ||
| } |
… tests Extend the excludeEmptyValues filter to getMetaSampleClinicalData / getMetaPatientClinicalData so the paginated clinical-data endpoints' totalCount stays consistent with the rows actually returned by the fetch/get reads (otherwise totalCount would still include the empty rows). Add an empty-value fixture entity (TCGA-EMPTY-VAL-PT / -01 in acc_tcga) to clickhouse_data_legacy.sql and regression tests in ClinicalDataMyBatisRepositoryTest covering patient/sample/fetch reads and meta counts. Update PatientMyBatisRepositoryTest global counts (18 -> 19) for the added fixture patient. Verified: all 398 *MyBatisRepositoryTest tests pass against a ClickHouse Testcontainer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|



Summary
Fixes the group-comparison Clinical tab bar chart rendering an unlabeled legend entry where an "NA" category should appear (visible on staging/v7 but not production/v6).
Root cause
The v7 ClickHouse import stores blank clinical cells as empty-string (
'') rows inclinical_patient/clinical_sample, whereas v6 omitted them entirely. The legacyPOST /api/clinical-data/fetch(ClinicalDataController→ClinicalDataMyBatisRepository→ClinicalDataMapper.xml, reading those tables directly) therefore returns a full cartesian — one row per patient/sample × attribute — with''for missing values.The comparison frontend (
groupComparison/ClinicalData.tsx) synthesizes the "NA" legend category only for samples that have no returned row, so the new empty-string values become a distinct category with a blank legend label.Confirmed live for comparison session
5ce411c7e4b0ab4137874076(acc_tcga): staging (v7.0.0) returned 77 patients × 7 attrs = 539 rows with 93 empty values; production (v6.4.4) returned only actual values (446 rows, 0 empty). Both run ClickHouse, so this is a v6→v7 import-behavior difference, not MySQL-vs-ClickHouse.Fix (mapper-level, option 2)
Filter
attr_value != ''in the legacy clinical-data reads via a guardedexcludeEmptyValuesflag, enabled for:getSampleClinicalData/getPatientClinicalData(backingfetchClinicalData,fetchAllClinicalDataInStudy,getAllClinicalDataInStudy,getAllClinicalDataOf{Sample,Patient}InStudy), andgetMetaSampleClinicalData/getMetaPatientClinicalData, so the paginated endpoints'totalCountstays consistent with the rows actually returned.The flag is required because the shared
whereSample/wherePatientfragments are also reused by the value-distribution counts and clinical-table queries — andgetVisibleSampleInternalIdsForClinicalTablehas noclinical_samplein its FROM — so an unconditional filter would alter those (or be a SQL error). Study-view NA counting (columnarclinical_data_derivedpath) is untouched.Testing
clickhouse_data_legacy.sql—TCGA-EMPTY-VAL-PT/TCGA-EMPTY-VAL-PT-01inacc_tcga, each with one non-empty (OTHER_PATIENT_ID/OTHER_SAMPLE_ID) and one empty (FORM_COMPLETION_DATE/DAYS_TO_COLLECTION) attribute.ClinicalDataMyBatisRepositoryTestcovering patient fetch, sample fetch,fetchClinicalData, and meta-count, asserting the empty values are excluded and the non-empty ones returned.PatientMyBatisRepositoryTestglobal counts (18 → 19) for the added fixture patient.*MyBatisRepositoryTesttests pass against a ClickHouse Testcontainer (JDK 21 / Maven).Follow-ups (not in this PR)
getPatientClinicalDataDetailedToSample(clinical-data downloads) still returns blanks if full consistency is desired.🤖 Generated with Claude Code