-
Notifications
You must be signed in to change notification settings - Fork 2
DGAI: Create HyDe utility function - ENG-292 #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
📝 WalkthroughWalkthroughA new TypeScript module, Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant hyde.ts
participant ClaudeAPI
participant EmbeddingFunc
participant SearchFunc
Caller->>hyde.ts: findSimilarNodesUsingHyde(candidateNodes, currentNodeText, relationTriplets)
loop For each relationTriplet
hyde.ts->>ClaudeAPI: generateHypotheticalNode(currentNodeText, relationTriplet)
ClaudeAPI-->>hyde.ts: hypotheticalNodeText
end
hyde.ts->>EmbeddingFunc: embed hypotheticalNodeTexts
hyde.ts->>SearchFunc: searchAgainstCandidates(embeddings, candidateNodes)
hyde.ts->>hyde.ts: combineScores(searchResults)
hyde.ts->>hyde.ts: rankNodes(combinedScores)
hyde.ts-->>Caller: suggestedNodes
Possibly related PRs
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (2)
apps/roam/src/utils/hyde.ts (2)
158-175
: Expose atopK
parameter for callers to control result set size
rankNodes
currently returns all candidates sorted by score. This leaks low-confidence suggestions, increases bandwidth, and can overwhelm clients. Accept atopK
option (default ≈ 20) and slice the result.
1-13
: Minor: preferinterface
for extensibilityFor publicly exported shapes (
CandidateNodeWithEmbedding
,SuggestedNode
) consider usinginterface
instead oftype
– users can extend them via declaration merging.
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
apps/roam/src/utils/hyde.ts (2)
45-55
:⚠️ Potential issuePrompt interpolation can still be broken by rogue back-ticks
Escaping with\
``helps, but an input containing `` ``` `` followed by newlines will still terminate the fence and let the user smuggle content or system directives (classic prompt-injection). Consider fencing the variables in JSON (or base-64-encoding) instead of Markdown, or at least replacing back-ticks in
node,
relationLabel`, etc.(Previous review raised the same concern.)
33-38
: 🛠️ Refactor suggestionExpose endpoint & timeout through environment variables
Hard-codingANTHROPIC_API_URL
andANTHROPIC_REQUEST_TIMEOUT_MS
forces a rebuild for every config change and makes local testing cumbersome. Reading them fromprocess.env
(with sane fall-backs) keeps the code flexible and consistent with how you already handle the model name.-export const ANTHROPIC_API_URL = - "https://discoursegraphs.com/api/llm/anthropic/chat"; +export const ANTHROPIC_API_URL = + process.env.ANTHROPIC_API_URL ?? + "https://discoursegraphs.com/api/llm/anthropic/chat"; @@ -export const ANTHROPIC_REQUEST_TIMEOUT_MS = 30_000; +export const ANTHROPIC_REQUEST_TIMEOUT_MS = Number( + process.env.ANTHROPIC_REQUEST_TIMEOUT_MS ?? 30_000, +);
🧹 Nitpick comments (2)
apps/roam/src/utils/hyde.ts (2)
78-85
: Trim surrounding quotes/markdown artefacts from LLM output
Claude often wraps responses in quotes or code fences. Down-stream ranking treats the text literally, so"Some fact"
vsSome fact
would hash to different embeddings. A small clean-up step prevents semantic duplicates.- return body.completion.trim(); + return body.completion + .trim() + .replace(/^```(markdown)?\s*/i, "") + .replace(/```$/, "") + .replace(/^["']|["']$/g, "") + .trim();
157-163
: AddtopK
limit before returning to avoid overwhelming the UI
rankNodes
currently returns all candidates, which could be hundreds. Consider slicing to the caller-requestedtopK
(default 20?) to keep transport and rendering light, and avoid showing marginally-relevant results.combinedResults.sort((a, b) => b.score - a.score); - return combinedResults.map((item) => ({ + const topK = 20; // make this a parameter or env var + return combinedResults.slice(0, topK).map((item) => ({
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/roam/src/utils/hyde.ts
(1 hunks)
🔇 Additional comments (1)
apps/roam/src/utils/hyde.ts (1)
108-125
: Parallel search implementation looks great
Switching toPromise.all
removes the N-round-trip latency bottleneck and the defensivetry/…catch
keeps partial failures contained. Nice improvement!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions/change requests
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
apps/roam/src/utils/hyde.ts (1)
131-136
:⚠️ Potential issueLLM response assumed to be plain text – may break when JSON is returned
Earlier feedback (now reverted) highlighted that the endpoint often returns JSON (
{ completion: "..." }
). Relying onresponse.text()
silently returns[object Object]
when the payload format changes, cascading bad prompts downstream.- return await response.text(); + const body = await response.json().catch(() => null); + if (!body || typeof body.completion !== "string") { + throw new Error("Unexpected payload from LLM endpoint"); + } + return body.completion.trim();Please reinstate structured parsing to avoid subtle failures and allow better error messages.
🧹 Nitpick comments (4)
apps/website/app/api/embeddings/openai/small/route.ts (2)
61-63
: Validatedimensions
range & model compatibility before forwarding to OpenAI
dimensions
is only accepted fortext-embedding-3*
models and must be within the range supported by the chosen model (currently 1-→1536 for -small and 1-→3072 for -large). Forwarding an out-of-range value will trigger a 400 from OpenAI.if (dimensions && model.startsWith("text-embedding-3")) { - options.dimensions = dimensions; + if ( + (model.endsWith("-small") && dimensions > 1536) || + (model.endsWith("-large") && dimensions > 3072) || + dimensions < 1 + ) { + return cors( + req, + NextResponse.json( + { error: "`dimensions` out of range for selected model." }, + { status: 400 }, + ), + ) as NextResponse; + } + options.dimensions = dimensions; }This prevents unnecessary upstream calls and returns an actionable error to the client.
65-71
:Promise.race
timeout doesn’t cancel the OpenAI requestThe pending
openai!.embeddings.create
call continues to run in the background even after the timeout fires, consuming resources and counting against rate limits.
Wrap the call in anAbortController
so both the caller and the underlying request are aborted:-const embeddingsPromise = openai!.embeddings.create(options); +const abortController = new AbortController(); +const embeddingsPromise = openai!.embeddings.create({ + ...options, + signal: abortController.signal as any, // OpenAI typings don’t expose `signal` yet +}); ... const openAIResponse = await Promise.race([embeddingsPromise, timeoutPromise]) .finally(() => abortController.abort());This keeps the worker process clean and avoids leaking concurrent connections.
apps/roam/src/utils/hyde.ts (2)
137-148
: Error strings leak internal details to callersReturning a literal string that embeds the raw
error.message
will surface stack traces, hostnames or API keys to the UI/log stream. Instead return a generic message and log the detailed error server-side (already done).- return `Error: Failed to generate hypothetical node. ${ - error instanceof Error ? error.message : String(error) - }`; + return "Error: Failed to generate hypothetical node.";This keeps user-visible output clean while retaining full diagnostics in server logs.
188-201
: Potentially massive payload – batchsubsetRoamUids
or fall back to local cosineAll candidate UIDs are sent in one RPC call. With thousands of nodes this will blow past URL/JSON limits and slow Supabase.
Consider:
- Chunk
subsetRoamUids
into e.g. 500-item batches and merge the results client-side.- When
indexData
already contains embeddings, compute cosine similarity locally for small datasets and skip the HTTP round-trip.Implementing either strategy will improve robustness and reduce latency for large graphs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/roam/src/utils/hyde.ts
(1 hunks)apps/website/app/api/embeddings/openai/small/route.ts
(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
apps/roam/src/utils/hyde.ts (1)
apps/roam/src/utils/types.ts (1)
Result
(42-46)
🔇 Additional comments (2)
apps/website/app/api/embeddings/openai/small/route.ts (1)
27-36
: Graceful fallback for missingOPENAI_API_KEY
looks goodEarly-returning a structured 500 JSON response avoids throwing and still honours CORS. Nice touch!
apps/roam/src/utils/hyde.ts (1)
303-314
:❌ Incorrect review comment
type
shadowing risk – reserve the key or rename
type
is added toCandidateNodeWithEmbedding
butResult
already allows arbitrary keys. If the original Roam node object contains its owntype
field, this will silently overwrite the value when you stripembedding
:const { embedding, ...restOfNode } = fullNode; // `type` ambiguitySafer to namespace the helper metadata, e.g.
nodeType
or_dgType
, to avoid collisions.-export type CandidateNodeWithEmbedding = Result & { - type: string; +export type CandidateNodeWithEmbedding = Result & { + hydeType: string; embedding: EmbeddingVectorType; };Update downstream destructuring accordingly.
🏁 Script executed:
#!/bin/bash # Locate the definition of CandidateNodeWithEmbedding rg "export type CandidateNodeWithEmbedding" -n -A3 -B3 apps/roam/src/utils/hyde.tsLength of output: 256
🏁 Script executed:
#!/bin/bash # Show imports and type definitions at top of the file, plus occurrences of `Result` sed -n '1,80p' apps/roam/src/utils/hyde.ts echo "----- Occurrences of Result -----" rg -n "Result" -C2 apps/roam/src/utils/hyde.tsLength of output: 4527
Ignore
type
rename—no shadowing in practice
The onlytype
property on these nodes comes from your ownCandidateNodeWithEmbedding
/SuggestedNode
aliases; the original Roam‐fetched objects here only carryuid
andtext
. When you doconst { embedding, ...restOfNode } = fullNode;
restOfNode.type
is always the custom field you declared, and there’s nothing for it to collide with. No downstream code pulls in any othertype
key, so you can leave the definition as-is.Likely an incorrect or invalid review comment.
Summary by CodeRabbit