-
Notifications
You must be signed in to change notification settings - Fork 2
ENG-449: Calculate the embeddings for updated discourse nodes #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENG-449: Calculate the embeddings for updated discourse nodes #231
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
📝 WalkthroughWalkthroughA new type alias, Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant fetchEmbeddingsForNodes
participant EmbeddingAPI
Caller->>fetchEmbeddingsForNodes: Provide array of DiscourseGraphContent nodes
loop For each batch of 100 nodes
fetchEmbeddingsForNodes->>EmbeddingAPI: POST /api/embeddings with batch texts
EmbeddingAPI-->>fetchEmbeddingsForNodes: Return embeddings array
end
fetchEmbeddingsForNodes-->>Caller: Return nodes augmented with model and vector
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
apps/roam/src/utils/fetchEmdeggingsForNodes.ts (2)
18-18
: Consider making the API URL configurable.The API URL is hardcoded, which may make it difficult to use different environments or endpoints. Consider making this configurable through environment variables or configuration.
- const apiUrl = `https://discoursegraphs.com/api/embeddings/openai/small`; + const apiUrl = process.env.EMBEDDINGS_API_URL || `https://discoursegraphs.com/api/embeddings/openai/small`;
66-66
: Consider making the model name configurable.The model name is hardcoded and should ideally match the API endpoint or be configurable to maintain consistency across different embedding models.
- model: "openai_text_embedding_3_small_1536", + model: process.env.EMBEDDING_MODEL_NAME || "openai_text_embedding_3_small_1536",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/roam/src/utils/fetchEmdeggingsForNodes.ts
(1 hunks)apps/roam/src/utils/types.ts
(1 hunks)
🔇 Additional comments (2)
apps/roam/src/utils/types.ts (1)
50-59
: LGTM! Well-structured type definition.The
DiscourseGraphContent
type is well-defined with appropriate property types and follows consistent naming conventions. Thevector
property asnumber[]
correctly represents embedding vectors.apps/roam/src/utils/fetchEmdeggingsForNodes.ts (1)
11-69
: Excellent implementation with robust error handling.The function is well-structured with:
- Proper batch processing to handle large datasets
- Comprehensive error handling for HTTP and data validation
- Clear error messages with batch information
- Validation that input and output counts match
- Good TypeScript typing throughout
The implementation effectively handles the complexity of batch processing while maintaining data integrity.
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
apps/roam/src/utils/fetchEmbeddingsForNodes.ts.ts (1)
1-70
: Fix the filename extension.The file has a double
.ts.ts
extension which is incorrect and may cause build issues.Rename the file from
fetchEmbeddingsForNodes.ts.ts
tofetchEmbeddingsForNodes.ts
.
🧹 Nitpick comments (4)
apps/roam/src/utils/types.ts (1)
50-59
: Consider making embedding-related fields optional.The type definition looks good overall, but consider whether
model
andvector
fields should be optional since nodes may not have embeddings initially before processing throughfetchEmbeddingsForNodes
.export type DiscourseGraphContent = { author_local_id: string; source_local_id: string; scale: string; created: string; last_modified: string; text: string; - model: string; - vector: number[]; + model?: string; + vector?: number[]; };This would allow the type to represent both input nodes (without embeddings) and output nodes (with embeddings) from the utility function.
apps/roam/src/utils/fetchEmbeddingsForNodes.ts.ts (3)
32-46
: Improve error handling robustness.The error handling could be more robust. The current implementation may fail to parse error responses properly in some cases.
if (!response.ok) { - let errorData; + let errorMessage = `Server responded with ${response.status}`; try { - errorData = await response.json(); + const errorData = await response.json(); + errorMessage = errorData.error || errorData.message || errorMessage; } catch (e) { - errorData = { - error: `Server responded with ${response.status}: ${await response.text()}`, - }; + try { + const textResponse = await response.text(); + errorMessage = textResponse || errorMessage; + } catch (textError) { + // Use default error message + } } throw new Error( - `API Error (${response.status}) processing batch ${ - i / EMBEDDING_BATCH_SIZE + 1 - }: ${errorData.error || "Failed to fetch embeddings"}`, + `API Error (${response.status}) processing batch ${ + i / EMBEDDING_BATCH_SIZE + 1 + }: ${errorMessage}`, ); }
66-66
: Make model name consistent with API endpoint.The hardcoded model name should match the API endpoint being used or be configurable.
+const EMBEDDING_MODEL = "openai_text_embedding_3_small_1536"; + export const fetchEmbeddingsForNodes = async ( nodes: DiscourseGraphContent[], ): Promise<DiscourseGraphContent[]> => { // ... rest of function return nodes.map((node, i) => ({ ...node, - model: "openai_text_embedding_3_small_1536", + model: EMBEDDING_MODEL, vector: allEmbeddings[i], })) as DiscourseGraphContent[];This ensures consistency and makes it easier to update if the model changes.
11-69
: Consider adding rate limiting and retry logic.For production use with external APIs, consider adding rate limiting and retry logic to handle temporary failures gracefully.
The function currently makes all API calls without any delay or retry mechanism. For better reliability:
- Add retry logic for transient failures (network timeouts, 5xx errors)
- Add exponential backoff between retries
- Consider adding a delay between batches to respect API rate limits
- Add timeout configuration for fetch requests
This would make the function more robust in production environments where network issues or API rate limits may occur.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/roam/src/utils/fetchEmbeddingsForNodes.ts.ts
(1 hunks)apps/roam/src/utils/types.ts
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
apps/roam/src/utils/fetchEmbeddingsForNodes.ts.ts (1)
apps/roam/src/utils/types.ts (1)
DiscourseGraphContent
(50-59)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 A few changes.
https://www.loom.com/share/44125280d2eb4aec921b0933e8e27ee3?sid=b17f08dc-372c-45ab-88c0-4209cb50e250
Loom does not have access to console log so adding a screenshot of the console.log
Tested using stub code that fetches 5 discourse nodes from graph and creates embeddings for them using this function
Summary by CodeRabbit