Skip to content

Conversation

sid597
Copy link
Collaborator

@sid597 sid597 commented Aug 17, 2025

Summary by CodeRabbit

  • New Features

    • AI-powered related node suggestions to help you discover connected pages and content.
    • Generates richer contextual cues to broaden search and returns ranked results by relevance.
    • Supports scanning all pages or focusing on contextually related references and parent/child blocks.
  • Refactor

    • Streamlined configuration retrieval to reduce unnecessary initialization. No user-facing changes.

Copy link

linear bot commented Aug 17, 2025

Copy link

supabase bot commented Aug 17, 2025

This pull request has been ignored for the connected project zytfjzqyijgagqxrzbmz because there are no changes detected in packages/database/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@sid597 sid597 changed the base branch from main to eng-733-glue-ui-components-to-data-extraction-and-database August 17, 2025 18:53
@sid597 sid597 marked this pull request as ready for review August 19, 2025 14:47
@sid597
Copy link
Collaborator Author

sid597 commented Aug 19, 2025

@coderabbitai full review

Copy link
Contributor

coderabbitai bot commented Aug 19, 2025

✅ Actions performed

Full review triggered.

Copy link
Collaborator Author

@sid597 sid597 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the hyde.ts code (until line 342) was reviewed in
#131

new changes are

  1. ESLint Directive Added
    typescript/* eslint-disable @typescript-eslint/naming-convention */
  2. New Imports Added
typescriptimport { getLoggedInClient } from "./supabaseContext";
import normalizePageTitle from "roamjs-components/queries/normalizePageTitle";
import findDiscourseNode from "./findDiscourseNode"
  1. New Type Definitions
typescripttype ResultItemMin = { uid: string };
type ExistingResultGroup = {
  label: string;
  results: Record<string, ResultItemMin>;
};
  1. API Configuration Simplified
    Old version had a more complex structure:
typescriptBASE_URL: { DEV: "http://localhost:3000", PROD: "https://discoursegraphs.com" },
EMBEDDINGS: { PATH: "/api/embeddings/openai/small" },
SUPABASE: { MATCH_EMBEDDINGS_PATH: "/api/supabase/rpc/search-content" }

New version is simpler:

typescriptEMBEDDINGS_URL: "https://discoursegraphs.com/api/embeddings/openai/small"
  1. Removed Functions

getBaseUrl() function was removed (no longer needed with simplified config)

  1. Modified searchEmbeddings Function
    The biggest change - it now uses direct Supabase client instead of REST API:
    Old version:
typescriptconst fullApiUrl = `${getBaseUrl()}${API_CONFIG.SUPABASE.MATCH_EMBEDDINGS_PATH}`;
const response = await fetch(fullApiUrl, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    queryEmbedding: queryEmbedding,
    subsetPlatformIds: subsetRoamUids,
  }),
});

New version:

typescriptconst supabaseClient = await getLoggedInClient();
const { data, error } = await supabaseClient.rpc(
  "match_embeddings_for_subset_nodes",
  {
    p_query_embedding: JSON.stringify(queryEmbedding),
    p_subset_roam_uids: subsetRoamUids,
  },
);
  1. Error Handling Changes

errorData type changed from implicit to unknown
Removed conditional error message construction in handleApiError
Simplified error handling in searchEmbeddings to use Supabase's error object

  1. Simplified createEmbedding
    Now directly uses API_CONFIG.EMBEDDINGS_URL instead of constructing URL with getBaseUrl()

Copy link
Contributor

coderabbitai bot commented Aug 19, 2025

✅ Actions performed

Full review triggered.

Copy link
Contributor

coderabbitai bot commented Aug 19, 2025

📝 Walkthrough

Walkthrough

Adds a new Hyde-based node discovery utility for Roam, implementing hypothetical node generation, embedding, and Supabase similarity search, with Roam API helpers. Also adjusts supabaseContext to lazily resolve the settings page UID within getOrCreateSpacePassword instead of at module load.

Changes

Cohort / File(s) Summary
Hyde utility and search orchestration
apps/roam/src/utils/hyde.ts
New module implementing Hyde-based discovery: defines public types, generates hypothetical related node texts via LLM, computes embeddings, queries Supabase similarity index, merges/ranks suggestions, and exposes helper functions for fetching pages/references and assembling candidate sets. Adds performHydeSearch and related utilities.
Supabase context init timing
apps/roam/src/utils/supabaseContext.ts
Moves settingsConfigPageUid lookup into getOrCreateSpacePassword (lazy per call). Removes top-level constant; logic for reading/storing space-user-password unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant UI as Caller
  participant HYDE as hyde.ts
  participant LLM as LLM (hypothetical gen)
  participant EMB as Embedding Service
  participant SB as Supabase (embedding index)
  participant RA as Roam API

  UI->>HYDE: performHydeSearch(params)
  HYDE->>RA: Fetch candidate pages/refs (per flags)
  RA-->>HYDE: Candidate nodes

  loop For each relation triplet
    HYDE->>LLM: Generate hypothetical related node texts
    LLM-->>HYDE: Hypothetical texts
    HYDE->>EMB: Embed hypothetical texts
    EMB-->>HYDE: Embedding vectors
    HYDE->>SB: Similarity search (per embedding)
    SB-->>HYDE: Matching nodes + scores
  end

  HYDE->>HYDE: Merge, dedupe, rank by best score
  HYDE-->>UI: SuggestedNode[]

  note over HYDE: Errors/timeouts caught with logging
Loading
sequenceDiagram
  autonumber
  participant Caller
  participant SC as supabaseContext.ts
  participant RA as Roam API

  Caller->>SC: getOrCreateSpacePassword()
  SC->>RA: getPageUidByPageTitle(settings)
  RA-->>SC: settingsConfigPageUid
  SC->>RA: getBlockProps(space-user-password)
  RA-->>SC: existing or missing
  alt missing
    SC->>SC: generate UUID password
    SC->>RA: setBlockProps(space-user-password)
  end
  SC-->>Caller: password
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
apps/roam/src/utils/supabaseContext.ts (2)

31-33: Optional: cache the config page UID after the first lookup

You now pay the lookup cost on every call. You can keep lazy initialization while avoiding repeated queries by caching the UID.

Apply this change within the function:

-  const settingsConfigPageUid = getPageUidByPageTitle(
-    DISCOURSE_CONFIG_PAGE_TITLE,
-  );
+  const settingsConfigPageUid =
+    _settingsConfigPageUid ||
+    (_settingsConfigPageUid = getPageUidByPageTitle(
+      DISCOURSE_CONFIG_PAGE_TITLE,
+    ));

And add this module-level cache (outside the function):

// cache lazily-resolved config page UID
let _settingsConfigPageUid: string | null = null;

35-36: Nit: simplify type of existing

string | unknown collapses to unknown in TypeScript. Either rely on inference or annotate as unknown for clarity:

const existing = props["space-user-password"] as unknown;
apps/roam/src/utils/hyde.ts (4)

113-121: Ensure AbortSignal.timeout is supported or provide a fallback

AbortSignal.timeout(...) isn’t available in all browsers/environments. Roam runs in the browser; compatibility can vary. Consider a manual AbortController-based timeout.

Example replacement:

const controller = new AbortController();
const timeoutId = setTimeout(() => {
  // Create a TimeoutError-like abort
  controller.abort(new DOMException("Timeout", "TimeoutError"));
}, API_CONFIG.LLM.TIMEOUT_MS);

try {
  response = await fetch(API_CONFIG.LLM.URL, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(requestBody),
    signal: controller.signal,
  });
} finally {
  clearTimeout(timeoutId);
}

Would you like me to patch this in?


58-67: Consider externalizing API endpoints/configs

Hardcoding API URLs/models makes testing and environment switching harder. Consider reading from environment or a centralized config.

Example: read from process.env with sane defaults.


21-23: Naming nit: CandidateNodeWithEmbedding does not carry an embedding

The type only has { type: string } and a Result shape. Consider renaming to CandidateNode (or add an embedding?: number[] if you plan to use client-side matching).


344-350: All pages query: LGTM

The datalog looks correct for fetching [pageName, pageUid] pairs. Keep an eye on performance on very large graphs; you may need pagination or Supabase-backed listing (your TODO hints this).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4f2807b and 4d5fb5b.

📒 Files selected for processing (2)
  • apps/roam/src/utils/hyde.ts (1 hunks)
  • apps/roam/src/utils/supabaseContext.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-25T22:56:17.522Z
Learnt from: maparent
PR: DiscourseGraphs/discourse-graph#0
File: :0-0
Timestamp: 2025-06-25T22:56:17.522Z
Learning: In the Roam discourse-graph system, the existence of the configuration page (identified by DISCOURSE_CONFIG_PAGE_TITLE) and its corresponding UID is a system invariant. The code can safely assume this page will always exist, so defensive null checks are not needed when using `getPageUidByPageTitle(DISCOURSE_CONFIG_PAGE_TITLE)`.

Applied to files:

  • apps/roam/src/utils/supabaseContext.ts
🔇 Additional comments (4)
apps/roam/src/utils/supabaseContext.ts (1)

31-33: Lazy resolution of settings page UID: LGTM

Moving getPageUidByPageTitle(DISCOURSE_CONFIG_PAGE_TITLE) into getOrCreateSpacePassword is sound and matches the PR objective. Given the invariant that the config page always exists, this is safe.

apps/roam/src/utils/hyde.ts (3)

127-127: Confirm LLM endpoint returns plain text

You’re using await response.text(). If the endpoint returns JSON (e.g., { content: "..." }), this will produce incorrect output downstream. Please confirm the API contract. If it returns JSON, parse and extract the text field instead.

Example:

const { content } = (await response.json()) as { content: string };
return content;

181-187: Double-check RPC param types (embedding marshaling)

You JSON.stringify the embedding vector. If the Postgres function match_embeddings_for_subset_nodes expects a numeric array or vector, stringify-ing may be incorrect. If it expects JSON, this is fine. Please verify the function signature and adjust accordingly.

Potential adjustment:

const { data, error } = await supabaseClient.rpc(
  "match_embeddings_for_subset_nodes",
  {
    p_query_embedding: queryEmbedding, // pass as array if function expects numeric[]
    p_subset_roam_uids: subsetRoamUids,
  }
);

498-507: HYDE ranking pipeline: LGTM

The flow (generate hypotheticals → per-hypo embedding search → max aggregation → ranking) is coherent and robust to partial failures. Good error handling and early exits.

@sid597 sid597 changed the base branch from eng-733-glue-ui-components-to-data-extraction-and-database to main August 24, 2025 16:56
@sid597 sid597 requested a review from mdroidian August 24, 2025 17:28
Copy link
Contributor

@mdroidian mdroidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💃

@sid597 sid597 merged commit 6c63455 into main Aug 25, 2025
5 checks passed
@github-project-automation github-project-automation bot moved this to Done in General Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants