Skip to content

Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object#542

Open
lingminhao wants to merge 13 commits intodevel_pre_v4from
generateColData
Open

Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object#542
lingminhao wants to merge 13 commits intodevel_pre_v4from
generateColData

Conversation

@lingminhao
Copy link
Collaborator

@lingminhao lingminhao commented Feb 15, 2026

This PR allows the user to provide sample-specific metadata using the sampleData argument.

Format Supported:

  • Bulk Data: one single .csv metadata file with a mandatory sampleName column is sufficient. Every row then contains metadata information for a sample. Multiple .csv file for each sample is possible, but not necessary.
  • Single-Cell / Spatial Data: one .csv metadata file per single-cell/spatial sample, each containing a mandatory barcode column.

If a specific sample lacks metadata, users can simply pass a NA value at the corresponding index in the input vector (e.g., c("metadata_sample1.csv", NA, "metadata_sample3.csv")).

Users can then define any additional metadata columns as needed in the metadata .csv file. The metadata will be stored in the colData of the se SummarizedExperiment object

@lingminhao lingminhao changed the base branch from devel to devel_pre_v4 February 15, 2026 14:09
@lingminhao lingminhao requested a review from Copilot February 15, 2026 14:09
@lingminhao lingminhao added the bambu-dev Feature is implemented in development branch label Feb 15, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a sampleData argument to bambu() to allow users to attach per-sample (bulk) or per-sample/per-barcode (single-cell/spatial) metadata from CSV files into the output SummarizedExperiment’s colData.

Changes:

  • Adds sampleData to bambu() and threads it into assignReadClasstoTranscripts().
  • Reworks generateColData() to left-join user-provided CSV metadata by sampleName (bulk) or barcode (demultiplexed).
  • Changes multi-sample SE assembly to carry forward per-sample colData into the combined SE.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
R/bambu.R Adds sampleData param and passes it through quantification path; changes how combined colData is built.
R/bambu-assignDist.R Extends assignReadClasstoTranscripts() signature to accept sampleData and uses new generateColData().
R/bambu_utilityFunctions.R Updates combineCountSes() to accept external colData list and rewrites generateColData() to join CSV metadata.
R/bambu-processReads_utilityConstructReadClasses.R Formatting/brace cleanup in read-class construction.
R/bambu-processReads.R Removes an unused warnings placeholder variable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@GoekeLab GoekeLab deleted a comment from Copilot AI Feb 17, 2026
Copy link
Collaborator

@ch99l ch99l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the requested code changes, code runs as expected.

@ch99l ch99l requested a review from SuiYue-2308 March 9, 2026 10:04
Comment on lines +315 to +316
colData$sampleName = sub('_[^_]+$', '', samples)
colData$barcode <- sub('.*_', '', samples)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment on the expected pattern? that will help to understand this line/change if needed

Comment on lines +333 to +335
names(countsSeCompressed.all) <- ColNames

countsSe <- combineCountSes(countsSeCompressed.all, colData.all, annotations)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColNames should be identical to the names of the colData? otherwise there is a mismatch between sample names in colData and in assays?

colData <- tibble(id = samples)
if (demultiplexed) {
colData$sampleName = sub('_[^_]+$', '', samples)
colData$barcode <- sub('.*_', '', samples)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code works incorrectly for visium HD samples due to regex as the barcodes have this format: s_002um_xpos_ypos

Instead of regex, use information extracted in the CB column previously (refer to prepareDataFromBam.R)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bambu-dev Feature is implemented in development branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants