-
Notifications
You must be signed in to change notification settings - Fork 2
Roam: ENG-733 Glue Settings UI components to database operations #343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roam: ENG-733 Glue Settings UI components to database operations #343
Conversation
This pull request has been ignored for the connected project Preview Branches by Supabase. |
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughRefactors orphaned-node cleanup to use explicit Supabase client/context; updates embeddings utility types/model/batch size; overhauls discourse-node retrieval to per-type with extension settings; adds a new sync orchestrator that coordinates task lifecycle, embeddings upsert, concept conversion, and cleanup; introduces a helper module to upsert nodes as content with embeddings. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Ext as Roam Extension
participant Sync as createOrUpdateDiscourseEmbedding
participant SB as Supabase RPC/API
participant Emb as Embedding Service
Ext->>Sync: Trigger sync
Sync->>SB: propose_sync_task()
SB-->>Sync: SyncTaskInfo (worker/timeout)
Sync->>Ext: get nodeTypes & settings
Sync->>Ext: getAllDiscourseNodesSince(since, nodeTypes, extensionAPI)
Sync->>Emb: fetchEmbeddingsForNodes(batches)
Emb-->>Sync: vectors
Sync->>SB: upsert_content(batched)
Sync->>SB: upsert_concepts(ordered)
Sync->>SB: cleanupOrphanedNodes(client, context)
Sync->>SB: end_sync_task(status="complete")
sequenceDiagram
autonumber
participant Caller
participant Clean as cleanupOrphanedNodes
participant SB as Supabase
Caller->>Clean: cleanupOrphanedNodes(client, context)
Clean->>SB: fetch schema node UIDs
SB-->>Clean: schema UIDs
Clean->>SB: fetch block content nodes
SB-->>Clean: block node UIDs
Clean->>SB: delete orphaned schemas (+content/concepts/docs)
SB-->>Clean: deleted counts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (8)
apps/roam/src/utils/cleanupOrphanedNodes.ts (3)
16-19
: Redundant null check forcontext
The
context
parameter is typed as non-nullableSupabaseContext
, but Line 16 still checks if it's falsy. This check is unnecessary given the type signature.Remove the redundant check:
- if (!context) { - console.error("Failed to get Supabase context"); - return []; - } const { spaceId } = context;
102-105
: Duplicate null check patternSimilar to the previous comment, this null check is redundant given the non-nullable type signature.
Remove the redundant check:
- if (!context) { - console.error("Failed to get Supabase context"); - return []; - }
40-66
: Performance: Consider chunking large IN clause queriesWhen
schemaIds
contains many elements, the IN clause at Line 52 could potentially hit database query limits or performance issues.Consider chunking large arrays:
if (schemaIds.length > 0) { + const CHUNK_SIZE = 500; // Adjust based on DB limits + const chunks = []; + for (let i = 0; i < schemaIds.length; i += CHUNK_SIZE) { + chunks.push(schemaIds.slice(i, i + CHUNK_SIZE)); + } + + const results = []; + for (const chunk of chunks) { const conceptResponse = await supabaseClient .from("Concept") .select( ` Content!inner ( source_local_id ) `, ) .eq("space_id", spaceId) .eq("is_schema", false) - .in("schema_id", schemaIds) + .in("schema_id", chunk) .not("Content.source_local_id", "is", null); if (conceptResponse.error) { console.error( "Failed to get concepts from Supabase:", conceptResponse.error, ); return []; } + results.push(...(conceptResponse.data || [])); + } nodeResult = - conceptResponse.data + results ?.map((c) => c.Content?.source_local_id) .filter((id): id is string => !!id) || []; }apps/roam/src/utils/getAllDiscourseNodesSince.ts (1)
149-171
: Check for duplicates when filtering node typesThe
nodeTypeSince
function doesn't validate whethernodeTypes
contains duplicates, which could lead to redundant queries if the same node type appears multiple times.Add deduplication:
export const nodeTypeSince = async ( since: ISODateString, nodeTypes: DiscourseNode[], ) => { const sinceMs = new Date(since).getTime(); + // Deduplicate node types by their type property + const uniqueNodeTypes = Array.from( + new Map(nodeTypes.map(node => [node.type, node])).values() + ); const filterMap = await Promise.all( - nodeTypes.map((node) => { + uniqueNodeTypes.map((node) => { const query = ` [:find ?node-title :in $ ?since ?type :where [?node :block/uid ?type] [?node :node/title ?node-title] [?node :edit/time ?nodeEditTime] [(> ?nodeEditTime ?since)]] `; const result = window.roamAlphaAPI.data.q(query, sinceMs, node.type); return result.length > 0; }), ); - const nodesSince = nodeTypes.filter((_, index) => filterMap[index]); + const nodesSince = uniqueNodeTypes.filter((_, index) => filterMap[index]); return nodesSince; };apps/roam/src/utils/syncDgNodesToSupabase.ts (1)
241-241
: Destructuring unused variableThe destructuring includes a variable that isn't used in the subsequent code.
Remove the unused variable:
- const { ordered } = orderConceptsByDependency(conceptsToUpsert); - const { error } = await supabaseClient.rpc("upsert_concepts", { + const { ordered } = orderConceptsByDependency(conceptsToUpsert); + const { error } = await supabaseClient.rpc("upsert_concepts", {apps/roam/src/utils/upsertNodesAsContentWithEmbeddings.ts (3)
20-20
: Remove debug console.log statementDebug logging statement should be removed from production code.
const allEmbeddings: number[][] = []; - console.log("nodes", nodes); const allNodesTexts = nodes.map((node) =>
126-130
: Redundant context checkThe context parameter is already destructured on Line 130, but there's a null check on Line 126. Since TypeScript enforces non-nullable types, this check is redundant.
Remove the redundant check:
): Promise<void> => { - if (!context) { - console.error("No Supabase context found."); - return; - } const { spaceId, userId } = context;
108-108
: Type assertion withas any
bypasses type safetyUsing
as any
for the data parameter bypasses TypeScript's type checking and could hide type mismatches.Consider proper typing or documenting why the assertion is necessary:
const { error } = await supabaseClient.rpc("upsert_content", { - data: contents as any, + // TODO: Fix type mismatch between LocalContentDataInput and RPC expected type + data: contents as any, // Type assertion needed due to RPC type definition v_space_id: spaceId,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
apps/roam/src/utils/cleanupOrphanedNodes.ts
(3 hunks)apps/roam/src/utils/fetchEmbeddingsForNodes.ts
(3 hunks)apps/roam/src/utils/getAllDiscourseNodesSince.ts
(1 hunks)apps/roam/src/utils/syncDgNodesToSupabase.ts
(1 hunks)apps/roam/src/utils/upsertNodesAsContentWithEmbeddings.ts
(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: maparent
PR: DiscourseGraphs/discourse-graph#182
File: apps/website/app/utils/supabase/dbUtils.ts:22-28
Timestamp: 2025-05-30T14:49:24.016Z
Learning: In apps/website/app/utils/supabase/dbUtils.ts, expanding the KNOWN_EMBEDDINGS and DEFAULT_DIMENSIONS mappings to support additional embedding models requires corresponding database model changes (creating new embedding tables), which should be scoped as separate work from API route implementations.
📚 Learning: 2025-05-30T14:49:24.016Z
Learnt from: maparent
PR: DiscourseGraphs/discourse-graph#182
File: apps/website/app/utils/supabase/dbUtils.ts:22-28
Timestamp: 2025-05-30T14:49:24.016Z
Learning: In apps/website/app/utils/supabase/dbUtils.ts, expanding the KNOWN_EMBEDDINGS and DEFAULT_DIMENSIONS mappings to support additional embedding models requires corresponding database model changes (creating new embedding tables), which should be scoped as separate work from API route implementations.
Applied to files:
apps/roam/src/utils/upsertNodesAsContentWithEmbeddings.ts
apps/roam/src/utils/syncDgNodesToSupabase.ts
apps/roam/src/utils/fetchEmbeddingsForNodes.ts
📚 Learning: 2025-06-22T10:40:52.752Z
Learnt from: sid597
PR: DiscourseGraphs/discourse-graph#232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.
Applied to files:
apps/roam/src/utils/fetchEmbeddingsForNodes.ts
apps/roam/src/utils/getAllDiscourseNodesSince.ts
📚 Learning: 2025-06-22T10:40:21.679Z
Learnt from: sid597
PR: DiscourseGraphs/discourse-graph#232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:15-16
Timestamp: 2025-06-22T10:40:21.679Z
Learning: In the getAllDiscourseNodesSince function in apps/roam/src/utils/getAllDiscourseNodesSince.ts, date validation is performed before the function is called, so additional date validation within the function is not needed.
Applied to files:
apps/roam/src/utils/getAllDiscourseNodesSince.ts
🧬 Code Graph Analysis (5)
apps/roam/src/utils/cleanupOrphanedNodes.ts (3)
packages/ui/src/lib/supabase/client.ts (1)
DGSupabaseClient
(9-13)packages/database/types.gen.ts (1)
Database
(9-819)apps/roam/src/utils/supabaseContext.ts (3)
SupabaseContext
(21-26)getSupabaseContext
(50-88)getLoggedInClient
(92-110)
apps/roam/src/utils/upsertNodesAsContentWithEmbeddings.ts (6)
apps/roam/src/utils/fetchEmbeddingsForNodes.ts (1)
fetchEmbeddingsForNodes
(13-70)apps/roam/src/utils/getAllDiscourseNodesSince.ts (1)
RoamDiscourseNodeData
(8-18)packages/ui/src/lib/supabase/client.ts (1)
DGSupabaseClient
(9-13)apps/roam/src/utils/supabaseContext.ts (1)
SupabaseContext
(21-26)packages/database/inputTypes.ts (1)
LocalContentDataInput
(6-12)apps/roam/src/utils/syncDgNodesToSupabase.ts (1)
upsertNodesToSupabaseAsContentWithEmbeddings
(252-331)
apps/roam/src/utils/syncDgNodesToSupabase.ts (6)
apps/roam/src/utils/supabaseContext.ts (3)
getLoggedInClient
(92-110)getSupabaseContext
(50-88)SupabaseContext
(21-26)packages/ui/src/lib/supabase/client.ts (1)
DGSupabaseClient
(9-13)apps/roam/src/utils/getAllDiscourseNodesSince.ts (3)
RoamDiscourseNodeData
(8-18)nodeTypeSince
(148-171)getAllDiscourseNodesSince
(76-146)packages/database/inputTypes.ts (1)
LocalContentDataInput
(6-12)apps/roam/src/utils/conceptConversion.ts (5)
discourseNodeSchemaToLocalConcept
(54-66)discourseRelationSchemaToLocalConcept
(92-115)discourseNodeBlockToLocalConcept
(68-88)discourseRelationDataToLocalConcept
(117-168)orderConceptsByDependency
(218-231)apps/roam/src/utils/cleanupOrphanedNodes.ts (1)
cleanupOrphanedNodes
(373-403)
apps/roam/src/utils/fetchEmbeddingsForNodes.ts (1)
apps/roam/src/utils/getAllDiscourseNodesSince.ts (1)
RoamDiscourseNodeData
(8-18)
apps/roam/src/utils/getAllDiscourseNodesSince.ts (1)
apps/obsidian/src/utils/getDiscourseNodeFormatExpression.ts (1)
getDiscourseNodeFormatExpression
(1-9)
export const getDiscourseNodeTypeWithSettingsBlockNodes = ( | ||
node: DiscourseNode, | ||
sinceMs: number, | ||
extensionAPI: OnloadArgs["extensionAPI"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update this to use configTree when this is merged to main and rebase the UI branch off it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to see what it looks like with a better type boundary, and then I'll look again at some of the update logic (but that looks good overall.)
} | ||
}; | ||
|
||
export const convertDgToSupabaseConcepts = async ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This converts and uploads. I think the name should reflect this. (Or better still, have this function return the converted concepts, and another function that calls this and upload. It'll be easier to test the former.)
Ok, thanks a lot for the deduplication and type boundaries, much clearer to read. |
@maparent Yeah I was not sure if we want to include the relations now, I think we should leave it for now, I will remove it from the function, thanks. The function will be called from the UI component I will update the callers on that pr #327 We want to merge this pr first then #344 then #327 then #340 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. I'll trust you checked the runtime. LGTM.
…ase' of https://github.com/DiscourseGraphs/discourse-graph into eng-733-glue-ui-components-to-data-extraction-and-database
@maparent it worked for me when I tested it a few times then I pushed the commit and asked you for review, now when I tried to test it again on a different graph after reading your exp. Its failing on my end also I am getting the error:
There is another type of bug that I am getting like
This bug I don't know refers to what, checking data that is passed on to upsert_content every data entry has the same All the bugs are quite confusing because I tried this in a personal graph which has 4 nodes to test it on small batch and it works (it also worked yesterday) ![]() ![]() |
Ok. First apologies: The commit was a red herring, I was tired yesterday. |
Bulk functions pushed in #357 |
Add functions to upsert platform accounts (individually or in bulk) Keep the old create_account_in_space function as a shim for now. Use the new upsert_account in upsert_documents and upsert_content, allowing for more complete inline information.
Yeah the personal graph only has me as the user but the other graph |
Add functions to upsert platform accounts (individually or in bulk) Keep the old create_account_in_space function as a shim for now. Use the new upsert_account in upsert_documents and upsert_content, allowing for more complete inline information.
…iscourseGraphs/discourse-graph into eng-764-bulk-account-uploading
…mponents-to-data-extraction-and-database
Add functions to upsert platform accounts (individually or in bulk) Keep the old create_account_in_space function as a shim for now. Use the new upsert_account in upsert_documents and upsert_content, allowing for more complete inline information.
…iscourseGraphs/discourse-graph into eng-764-bulk-account-uploading
…mponents-to-data-extraction-and-database
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this basically lgtm. One small nit before you push: There are a lot of imports that could be import type {...} ...
, or at least import {x, type y} ...
, fwiw.
? `${node.node_title} ${node.text}` | ||
: node.text; | ||
return { | ||
author_id: userId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... this is wrong, as it makes the current user the author of all nodes. In theory, you should not have to set this, since you set author_local_id
; but in practice, the author_local_id may not have been materialized in the database, which may lead to the insert failing. So make sure to collect all author_ids and creator_ids, and to create platformAccounts (prob. with createAccountInSpace; I will create a bulk version of that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see this comment before I need to learn the github ui better
]`; | ||
|
||
//@ts-ignore - backend to be added to roamjs-components | ||
const allNodes = (await window.roamAlphaAPI.data.async.q( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sid597 This should probably be .backend
. We can discuss at our next meeting.
[(> ?nodeEditTime ?since)]] | ||
]`; | ||
|
||
const blockNode = window.roamAlphaAPI.data.q( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sid597 This should probably be .backend
. We can discuss at our next meeting.
if (!entity.source_local_id) { | ||
return []; | ||
} | ||
const node = findDiscourseNode(entity.source_local_id, discourseNodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sid597 This will likely need to be a custom function. findDiscourseNode
will likely be a performance issue on large graphs. We can discuss at our next meeting.
Summary by CodeRabbit
New Features
Performance
Reliability
Developer Experience