Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object#542
Add sampleData argument to store multi sample's metadata information (bulk, single-cell, spatial) in se object#542lingminhao wants to merge 13 commits intodevel_pre_v4from
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a sampleData argument to bambu() to allow users to attach per-sample (bulk) or per-sample/per-barcode (single-cell/spatial) metadata from CSV files into the output SummarizedExperiment’s colData.
Changes:
- Adds
sampleDatatobambu()and threads it intoassignReadClasstoTranscripts(). - Reworks
generateColData()to left-join user-provided CSV metadata bysampleName(bulk) orbarcode(demultiplexed). - Changes multi-sample SE assembly to carry forward per-sample
colDatainto the combined SE.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| R/bambu.R | Adds sampleData param and passes it through quantification path; changes how combined colData is built. |
| R/bambu-assignDist.R | Extends assignReadClasstoTranscripts() signature to accept sampleData and uses new generateColData(). |
| R/bambu_utilityFunctions.R | Updates combineCountSes() to accept external colData list and rewrites generateColData() to join CSV metadata. |
| R/bambu-processReads_utilityConstructReadClasses.R | Formatting/brace cleanup in read-class construction. |
| R/bambu-processReads.R | Removes an unused warnings placeholder variable. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
65c11ec to
7d60435
Compare
ch99l
left a comment
There was a problem hiding this comment.
After the requested code changes, code runs as expected.
| colData$sampleName = sub('_[^_]+$', '', samples) | ||
| colData$barcode <- sub('.*_', '', samples) |
There was a problem hiding this comment.
can you add a comment on the expected pattern? that will help to understand this line/change if needed
| names(countsSeCompressed.all) <- ColNames | ||
|
|
||
| countsSe <- combineCountSes(countsSeCompressed.all, colData.all, annotations) |
There was a problem hiding this comment.
ColNames should be identical to the names of the colData? otherwise there is a mismatch between sample names in colData and in assays?
| colData <- tibble(id = samples) | ||
| if (demultiplexed) { | ||
| colData$sampleName = sub('_[^_]+$', '', samples) | ||
| colData$barcode <- sub('.*_', '', samples) |
There was a problem hiding this comment.
Code works incorrectly for visium HD samples due to regex as the barcodes have this format: s_002um_xpos_ypos
Instead of regex, use information extracted in the CB column previously (refer to prepareDataFromBam.R)
This PR allows the user to provide sample-specific metadata using the
sampleDataargument.Format Supported:
.csvmetadata file with a mandatorysampleNamecolumn is sufficient. Every row then contains metadata information for a sample. Multiple.csvfile for each sample is possible, but not necessary..csvmetadata file per single-cell/spatial sample, each containing a mandatorybarcodecolumn.If a specific sample lacks metadata, users can simply pass a
NAvalue at the corresponding index in the input vector (e.g.,c("metadata_sample1.csv", NA, "metadata_sample3.csv")).Users can then define any additional metadata columns as needed in the metadata
.csvfile. The metadata will be stored in thecolDataof theseSummarizedExperiment object