-
Notifications
You must be signed in to change notification settings - Fork 4
ENG-1074 Prod duplicate node alert on page #564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENG-1074 Prod duplicate node alert on page #564
Conversation
|
This pull request has been ignored for the connected project Preview Branches by Supabase. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughThis PR integrates vector similarity matching into the Roam note-taking application. It introduces new UI components and utilities to detect and display similar notes based on vector embeddings, with integration into the create-node dialog and page title observers for discourse nodes. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CreateNodeDialog
participant Debounce
participant VectorDuplicateMatches
participant searchAPI[findSimilarNodesVectorOnly]
participant Supabase
User->>CreateNodeDialog: Type title
CreateNodeDialog->>Debounce: Trigger 500ms debounce
Note over Debounce: Wait 500ms for input stabilization
Debounce->>VectorDuplicateMatches: Pass debouncedTitle
VectorDuplicateMatches->>searchAPI: Fetch similar nodes
searchAPI->>Supabase: Query vector embeddings
Supabase-->>searchAPI: Return matching nodes with scores
searchAPI-->>VectorDuplicateMatches: Return VectorMatch[] results
VectorDuplicateMatches->>VectorDuplicateMatches: Render collapsible panel
VectorDuplicateMatches-->>User: Display similar notes list
sequenceDiagram
participant Observer
participant initializeObserversAndListeners
participant useNodeContext
participant renderPossibleDuplicates
participant DiscourseNode[DiscourseNode Utils]
Observer->>initializeObserversAndListeners: Page title detected
initializeObserversAndListeners->>DiscourseNode: Check if discourse node
alt Is Discourse Node
DiscourseNode-->>initializeObserversAndListeners: true
initializeObserversAndListeners->>renderPossibleDuplicates: Call with H1 element
renderPossibleDuplicates->>renderPossibleDuplicates: Mount VectorDuplicateMatches
renderPossibleDuplicates-->>Observer: Component rendered in header
else Not Discourse Node
DiscourseNode-->>initializeObserversAndListeners: false
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Pre-merge checks✅ Passed checks (3 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
apps/roam/src/utils/initializeObserversAndListeners.ts (1)
55-57: Discourse-node duplicate rendering integration looks correct; consider guarding against empty UIDs.The flow of resolving the page UID via
getPageUidByPageTitleand gatingrenderPossibleDuplicatesbehindisDiscourseNode(uid)is sound and matches how discourse-node UIDs are handled elsewhere. As a small polish, you could avoid callingisDiscourseNodewhen Roam returns an empty UID:- const uid = getPageUidByPageTitle(title); - if (isDiscourseNode(uid)) { + const uid = getPageUidByPageTitle(title); + if (uid && isDiscourseNode(uid)) { renderPossibleDuplicates(h1, title); }Based on learnings, using
getPageUidByPageTitlesynchronously here is correct.Also applies to: 89-92
apps/roam/src/utils/useNodeContext.ts (2)
13-36: Content extraction from formatted titles is well-aligned with placeholder semantics.The logic of deriving placeholders from
node.format, usinggetDiscourseNodeFormatExpression, and preferring a{content}placeholder (falling back to the first capture) is a good match for how format expressions are built. If you find yourself needing the same behavior outside Roam (e.g., the Obsidian util), consider lifting this into a sharedextractContentFromTitlehelper to avoid divergence between apps.
38-69: useNodeContext matching strategy looks solid; UID-first then title-based fallback.Resolving the discourse node by
pageUidviafindDiscourseNodeand then falling back tomatchDiscourseNode({ ...node, title: pageTitle })gives a sensible priority order, and returningnullwhen nothing matches lets callers cleanly short‑circuit (asVectorDuplicateMatchesdoes). If you expect empty or transient titles, you could optionally short‑circuit early in the effect when!pageTitle.trim()to skipgetDiscourseNodes()work, but the current implementation is correct as is.Based on learnings, using the Roam page UID with
findDiscourseNode(whereDiscourseNode.typeis the UID field) is consistent with existing conventions.apps/roam/src/components/VectorDuplicateMatches.tsx (2)
8-21: Avoid duplicating the vector match shape; import the shared type from hyde.ts.
VectorMatchItemmirrors theVectorMatchtype exposed fromhyde.ts, and the castconst vectorSearch = findSimilarNodesVectorOnly as (...) => Promise<VectorMatchItem[]>;will silently drift if the underlying type ever changes. To keep things aligned, consider reusing the exported type instead of re-declaring and casting:-import type { Result } from "~/utils/types"; -import { findSimilarNodesVectorOnly } from "../utils/hyde"; +import type { Result } from "~/utils/types"; +import { findSimilarNodesVectorOnly, type VectorMatch } from "../utils/hyde"; @@ -type VectorMatchItem = { - node: Result; - score: number; -}; +type VectorMatchItem = VectorMatch & { node: Result }; @@ -const vectorSearch = findSimilarNodesVectorOnly as ( - params: VectorSearchParams, -) => Promise<VectorMatchItem[]>; +const vectorSearch = (params: VectorSearchParams) => + findSimilarNodesVectorOnly(params);
95-157: UI rendering and DOM mounting for duplicate matches are consistent with the Roam environment.The conditional rendering (no panel when
activeContextis null, spinner vs. "No matches found" vs. list) reads cleanly, and the list items correctly open the matched node in the right sidebar. TherenderPossibleDuplicateshelper’s use of a dedicated container class plusdata-page-titleto reuse or recreate the mount point, and repositioning relative to the H1’s container, should keep the injected UI stable across title changes and re-renders in Roam’s header.Also applies to: 160-199
apps/roam/src/utils/hyde.ts (1)
59-63: RPC signatures verified; defensive UID filtering remains a good practice.The parameter names and JSON serialization approach are correct—
p_query_embedding(string-serialized vector) andp_subset_roam_uids(text array) match the SQL function definition, andJSON.stringify()is the established pattern across the codebase for vector RPC calls. ThefindSimilarNodesVectorOnlyfunction uses the same serialization strategy withmatch_content_embeddings.The defensive filter suggestion still holds as a safeguard against unexpected falsy values:
- const subsetRoamUids = indexData.map((node) => node.uid); + const subsetRoamUids = indexData + .map((node) => node.uid) + .filter((uid): uid is string => !!uid);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
apps/roam/src/components/CreateNodeDialog.tsx(3 hunks)apps/roam/src/components/VectorDuplicateMatches.tsx(1 hunks)apps/roam/src/utils/hyde.ts(3 hunks)apps/roam/src/utils/initializeObserversAndListeners.ts(2 hunks)apps/roam/src/utils/useNodeContext.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-06-22T10:40:52.752Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 232
File: apps/roam/src/utils/getAllDiscourseNodesSince.ts:18-31
Timestamp: 2025-06-22T10:40:52.752Z
Learning: In apps/roam/src/utils/getAllDiscourseNodesSince.ts, the user confirmed that querying for `?title` with `:node/title` and mapping it to the `text` field in the DiscourseGraphContent type is the correct implementation for retrieving discourse node content from Roam Research, despite it appearing to query page titles rather than block text content.
Applied to files:
apps/roam/src/components/CreateNodeDialog.tsxapps/roam/src/utils/useNodeContext.tsapps/roam/src/utils/hyde.tsapps/roam/src/utils/initializeObserversAndListeners.ts
📚 Learning: 2025-08-25T15:53:21.799Z
Learnt from: sid597
Repo: DiscourseGraphs/discourse-graph PR: 372
File: apps/roam/src/components/DiscourseNodeMenu.tsx:116-116
Timestamp: 2025-08-25T15:53:21.799Z
Learning: In apps/roam/src/components/DiscourseNodeMenu.tsx, when handling tag insertion, multiple leading hashtags (like ##foo) should be preserved as they represent user intent, not normalized to a single hashtag. The current regex /^#/ is correct as it only removes one leading # before adding one back, maintaining any additional hashtags the user intended.
Applied to files:
apps/roam/src/components/CreateNodeDialog.tsx
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `type` serves as the unique identifier field, not a type classification field. The interface has no `uid` or `id` field, making `node.type` the correct field to use for UID-related operations.
Applied to files:
apps/roam/src/utils/useNodeContext.ts
📚 Learning: 2025-06-17T23:37:45.289Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 220
File: apps/roam/src/utils/conceptConversion.ts:42-56
Timestamp: 2025-06-17T23:37:45.289Z
Learning: In the DiscourseNode interface from apps/roam/src/utils/getDiscourseNodes.ts, the field `node.type` serves as the UID field rather than having a conventional `node.uid` field. This is an unusual naming convention where the type field actually contains the unique identifier.
Applied to files:
apps/roam/src/utils/useNodeContext.ts
📚 Learning: 2025-11-05T21:57:14.909Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 534
File: apps/roam/src/utils/createReifiedBlock.ts:40-48
Timestamp: 2025-11-05T21:57:14.909Z
Learning: In the discourse-graph repository, the function `getPageUidByPageTitle` from `roamjs-components/queries/getPageUidByPageTitle` is a synchronous function that returns a string directly (the page UID or an empty string if not found), not a Promise. It should be called without `await`.
Applied to files:
apps/roam/src/utils/useNodeContext.tsapps/roam/src/utils/initializeObserversAndListeners.ts
📚 Learning: 2025-05-30T14:49:24.016Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 182
File: apps/website/app/utils/supabase/dbUtils.ts:22-28
Timestamp: 2025-05-30T14:49:24.016Z
Learning: In apps/website/app/utils/supabase/dbUtils.ts, expanding the KNOWN_EMBEDDINGS and DEFAULT_DIMENSIONS mappings to support additional embedding models requires corresponding database model changes (creating new embedding tables), which should be scoped as separate work from API route implementations.
Applied to files:
apps/roam/src/utils/hyde.ts
📚 Learning: 2025-05-20T14:04:19.632Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/schemas/embedding.sql:66-95
Timestamp: 2025-05-20T14:04:19.632Z
Learning: In the `match_embeddings_for_subset_nodes` SQL function in packages/database/supabase/schemas/embedding.sql, the number of results is implicitly limited by the length of the input array parameter `p_subset_roam_uids` since the function filters content using `WHERE c.source_local_id = ANY(p_subset_roam_uids)`.
Applied to files:
apps/roam/src/utils/hyde.ts
📚 Learning: 2025-06-25T22:56:17.522Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 0
File: :0-0
Timestamp: 2025-06-25T22:56:17.522Z
Learning: In the Roam discourse-graph system, the existence of the configuration page (identified by DISCOURSE_CONFIG_PAGE_TITLE) and its corresponding UID is a system invariant. The code can safely assume this page will always exist, so defensive null checks are not needed when using `getPageUidByPageTitle(DISCOURSE_CONFIG_PAGE_TITLE)`.
Applied to files:
apps/roam/src/utils/initializeObserversAndListeners.ts
🧬 Code graph analysis (5)
apps/roam/src/components/VectorDuplicateMatches.tsx (3)
apps/roam/src/utils/types.ts (1)
Result(42-46)apps/roam/src/utils/hyde.ts (1)
findSimilarNodesVectorOnly(539-590)apps/roam/src/utils/useNodeContext.ts (2)
NodeContext(8-11)useNodeContext(38-69)
apps/roam/src/components/CreateNodeDialog.tsx (1)
apps/roam/src/components/VectorDuplicateMatches.tsx (1)
VectorDuplicateMatches(23-158)
apps/roam/src/utils/useNodeContext.ts (2)
apps/obsidian/src/utils/extractContentFromTitle.ts (1)
extractContentFromTitle(3-10)apps/obsidian/src/utils/getDiscourseNodeFormatExpression.ts (1)
getDiscourseNodeFormatExpression(1-9)
apps/roam/src/utils/hyde.ts (2)
apps/roam/src/utils/types.ts (1)
Result(42-46)apps/roam/src/utils/supabaseContext.ts (1)
getLoggedInClient(95-112)
apps/roam/src/utils/initializeObserversAndListeners.ts (1)
apps/roam/src/components/VectorDuplicateMatches.tsx (1)
renderPossibleDuplicates(160-199)
🔇 Additional comments (3)
apps/roam/src/components/CreateNodeDialog.tsx (1)
15-15: Debounced duplicate search integration into the create dialog looks good.The 500 ms debounced
debouncedTitleand its use as bothpageTitleand explicittextforVectorDuplicateMatchesis a sensible way to surface duplicate suggestions without firing vector searches on every keystroke, and usinglimit={5}keeps the UI compact in the dialog context.Also applies to: 37-52, 150-154
apps/roam/src/components/VectorDuplicateMatches.tsx (1)
37-82: Search triggering and cancellation logic is well-structured.Using
useNodeContext(pageTitle)combined with an override whentextis provided, resettinghasSearchedonactiveContext.searchTextchanges, and guarding the fetch onisOpen && !hasSearchedprovides a clear, debounced search lifecycle. TheisCancelledflag in the effect is a good safeguard against setting state after unmount or context changes; no changes needed here.apps/roam/src/utils/hyde.ts (1)
534-590: Now I need to check the actual SQL definition to confirm the issue:Based on my verification against Supabase documentation and pgvector standards, the embedding should be passed directly to the rpc() method, not stringified. Supabase documentation examples show passing
query_embedding: embeddingdirectly, and the Supabase client automatically serializes parameters as JSON. TheJSON.stringify()call converts the embedding array to a string, which will cause a type mismatch with the SQL function'svectortype parameter.Remove
JSON.stringify()from the embedding parameter in the RPC callChange line 558:
const { data, error } = await supabase.rpc("match_content_embeddings", { query_embedding: queryEmbedding, match_threshold: threshold, match_count: limit, });⛔ Skipped due to learnings
Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 503 File: apps/roam/src/utils/syncDgNodesToSupabase.ts:304-360 Timestamp: 2025-10-18T12:05:23.834Z Learning: When inserting pgvector embeddings via Supabase (PostgREST) in TypeScript, use JSON.stringify(vector) to serialize the number[] array to a string, as PostgREST does not natively support the pgvector type. This is the official workaround for working with vector columns through the REST API.Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 182 File: apps/website/app/utils/supabase/dbUtils.ts:22-28 Timestamp: 2025-05-30T14:49:24.016Z Learning: In apps/website/app/utils/supabase/dbUtils.ts, expanding the KNOWN_EMBEDDINGS and DEFAULT_DIMENSIONS mappings to support additional embedding models requires corresponding database model changes (creating new embedding tables), which should be scoped as separate work from API route implementations.Learnt from: maparent Repo: DiscourseGraphs/discourse-graph PR: 165 File: packages/database/supabase/schemas/embedding.sql:66-95 Timestamp: 2025-05-20T14:04:19.632Z Learning: In the `match_embeddings_for_subset_nodes` SQL function in packages/database/supabase/schemas/embedding.sql, the number of results is implicitly limited by the length of the input array parameter `p_subset_roam_uids` since the function filters content using `WHERE c.source_local_id = ANY(p_subset_roam_uids)`.
mdroidian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start! But I think this would be worth re-looking at useNodeContext because there is a lot going on there that seems unnecessary, which will result in longer page load times.
| let matchedNode: DiscourseNode | null = null; | ||
|
|
||
| if (pageUid) { | ||
| const found = findDiscourseNode(pageUid, discourseNodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we already run isDiscourseNode on page load (which calls both getDiscourseNodes and findDiscourseNode)? We shouldn't run these again.
| } | ||
|
|
||
| if (!matchedNode) { | ||
| matchedNode = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matchDiscourseNode is called by findDiscourseNode ... so this would be a third time that this is running.
| @@ -0,0 +1,69 @@ | |||
| import { useEffect, useState } from "react"; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me that useNodeContext is actually needed.
Let's make sure we need this file and all of these queries. It seems like we might only need extractContentFromTitle, which should be placed in a util file because a similar version is already used in getExportTypes (getContentFromNodes).
|
|
||
| try { | ||
| const supabase = await getLoggedInClient(); | ||
| if (!supabase) return []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this returns a false negative for those who haven't set up in our database yet. They might actually think we've searched and found nothing, but actually we haven't even searched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! We need to think through the database enabled features. Here we are checking if the client exists in supabase, but its wrong we need to check if the graph the embeddings are there in db.
And to check that we have only single route of suggestive mode, so should we be searching for if suggestive mode is enabled only then enable this feature, we can do this now but for future it could be that users don't want to enable suggestive mode but want duplicate node alert.
We need to decouple sync with suggestive mode setting until then we can use that.
c860b45 to
49170f6
Compare
49170f6 to
9c559c3
Compare


duplicate node alert on page
add to node create dialog, only show small list, fix lint errors, fix bug to show in all open pages
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.
https://www.loom.com/share/cb366c09942b4030a9933118ad281bc5