Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object by lingminhao · Pull Request #542 · GoekeLab/bambu

lingminhao · 2026-02-15T14:09:08Z

This PR allows the user to provide sample-specific metadata using the sampleData argument.

Format Supported:

Bulk Data: one single .csv metadata file with a mandatory sampleName column is sufficient. Every row then contains metadata information for a sample. Multiple .csv file for each sample is possible, but not necessary.
Single-Cell / Spatial Data: one .csv metadata file per single-cell/spatial sample, each containing a mandatory barcode column.

If a specific sample lacks metadata, users can simply pass a NA value at the corresponding index in the input vector (e.g., c("metadata_sample1.csv", NA, "metadata_sample3.csv")).

Users can then define any additional metadata columns as needed in the metadata .csv file. The metadata will be stored in the colData of the se SummarizedExperiment object

Copilot

Pull request overview

Adds a sampleData argument to bambu() to allow users to attach per-sample (bulk) or per-sample/per-barcode (single-cell/spatial) metadata from CSV files into the output SummarizedExperiment’s colData.

Changes:

Adds sampleData to bambu() and threads it into assignReadClasstoTranscripts().
Reworks generateColData() to left-join user-provided CSV metadata by sampleName (bulk) or barcode (demultiplexed).
Changes multi-sample SE assembly to carry forward per-sample colData into the combined SE.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
R/bambu.R	Adds `sampleData` param and passes it through quantification path; changes how combined `colData` is built.
R/bambu-assignDist.R	Extends `assignReadClasstoTranscripts()` signature to accept `sampleData` and uses new `generateColData()`.
R/bambu_utilityFunctions.R	Updates `combineCountSes()` to accept external colData list and rewrites `generateColData()` to join CSV metadata.
R/bambu-processReads_utilityConstructReadClasses.R	Formatting/brace cleanup in read-class construction.
R/bambu-processReads.R	Removes an unused `warnings` placeholder variable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

R/bambu_utilityFunctions.R

R/bambu-assignDist.R

R/bambu.R

R/bambu_utilityFunctions.R

R/bambu.R

…ame as function)

R/bambu_utilityFunctions.R

R/bambu.R

…or variable clarity

R/bambu_utilityFunctions.R

ch99l

After the requested code changes, code runs as expected.

jonathangoeke · 2026-03-23T01:59:41Z

R/bambu_utilityFunctions.R

+    colData$sampleName = sub('_[^_]+$', '', samples)
+    colData$barcode <- sub('.*_', '', samples)


can you add a comment on the expected pattern? that will help to understand this line/change if needed

jonathangoeke · 2026-03-23T02:20:30Z

R/bambu.R

+        names(countsSeCompressed.all) <- ColNames   
+
+        countsSe <- combineCountSes(countsSeCompressed.all, colData.all, annotations)


ColNames should be identical to the names of the colData? otherwise there is a mismatch between sample names in colData and in assays?

ch99l · 2026-03-23T02:48:33Z

R/bambu_utilityFunctions.R

+  colData <- tibble(id = samples)
+  if (demultiplexed) {
+    colData$sampleName = sub('_[^_]+$', '', samples)
+    colData$barcode <- sub('.*_', '', samples)


Code works incorrectly for visium HD samples due to regex as the barcodes have this format: s_002um_xpos_ypos

Instead of regex, use information extracted in the CB column previously (refer to prepareDataFromBam.R)

lingminhao added 5 commits February 15, 2026 21:33

tidy up code

14f5547

refactor generateColData to take sampleData as argument

d63fdae

refactor combineCountSes to inherit colData directly from quantData

f86ce9c

update colData for pseudobulk single-cell

ed26b1f

add sampleData argument

f9554b5

lingminhao changed the base branch from devel to devel_pre_v4 February 15, 2026 14:09

lingminhao requested a review from Copilot February 15, 2026 14:09

lingminhao assigned ch99l Feb 15, 2026

Copilot started reviewing on behalf of lingminhao February 15, 2026 14:10 View session

lingminhao added the bambu-dev Feature is implemented in development branch label Feb 15, 2026

Copilot AI reviewed Feb 15, 2026

View reviewed changes

GoekeLab deleted a comment from Copilot AI Feb 17, 2026

lingminhao added 4 commits February 19, 2026 09:02

remove spatial argument from bambu

0307855

rename colData parameter combineCountSes to colDataList (avoid same n…

8e25154

…ame as function)

update bambu sampleData parameter description

c29eab4

refine sampleData input check description

bfa131e

ch99l requested changes Feb 20, 2026

View reviewed changes

R/bambu_utilityFunctions.R Show resolved Hide resolved

R/bambu_utilityFunctions.R Outdated Show resolved Hide resolved

R/bambu_utilityFunctions.R Outdated Show resolved Hide resolved

R/bambu.R Outdated Show resolved Hide resolved

lingminhao added 3 commits February 20, 2026 14:00

tidy up spatial & sampleData argument

9c2d8ba

change sampleData to sampleMetadata in assignReadClasstoTranscripts f…

fda300b

…or variable clarity

fix bug: omit the check for NA elements in sampleData

e80cb39

ch99l requested changes Mar 2, 2026

View reviewed changes

R/bambu_utilityFunctions.R Show resolved Hide resolved

allow . csv/.tsv/.txt file input type in sampleData

7d60435

lingminhao force-pushed the generateColData branch from 65c11ec to 7d60435 Compare March 3, 2026 07:19

ch99l approved these changes Mar 9, 2026

View reviewed changes

ch99l requested a review from SuiYue-2308 March 9, 2026 10:04

jonathangoeke reviewed Mar 23, 2026

View reviewed changes

ch99l requested changes Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object#542

Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object#542
lingminhao wants to merge 13 commits intodevel_pre_v4from
generateColData

lingminhao commented Feb 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ch99l left a comment

Uh oh!

jonathangoeke Mar 23, 2026

Uh oh!

jonathangoeke Mar 23, 2026

Uh oh!

ch99l Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		colData$sampleName = sub('_[^_]+$', '', samples)
		colData$barcode <- sub('.*_', '', samples)

		names(countsSeCompressed.all) <- ColNames

		countsSe <- combineCountSes(countsSeCompressed.all, colData.all, annotations)

Conversation

lingminhao commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ch99l left a comment

Choose a reason for hiding this comment

Uh oh!

jonathangoeke Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

jonathangoeke Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

ch99l Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lingminhao commented Feb 15, 2026 •

edited

Loading