Skip to content

Conversation

maparent
Copy link
Collaborator

@maparent maparent commented Jun 9, 2025

Proposed replacement for the alpha-upload-discourse-nodes function in supabase/schema/upload_temp.sql

Summary by CodeRabbit

  • New Features

    • Added support for batch upserting documents and content, including handling of local and inline references, metadata, and embedding vectors.
    • Enabled upserting of platform account information as part of document and content synchronization.
    • Introduced new database functions for inserting or updating documents, content, and embeddings in bulk.
  • Documentation

    • Added detailed usage examples for the new upsert functions, demonstrating how to structure input data and handle references in TypeScript.
  • Database

    • Added new composite types and functions to support structured input and upsert operations.
    • Enforced uniqueness constraints on content embeddings to prevent duplicate entries.

Copy link

linear bot commented Jun 9, 2025

Copy link

vercel bot commented Jun 9, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
discourse-graph ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 23, 2025 1:41pm

Copy link
Contributor

coderabbitai bot commented Jun 9, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This update introduces new composite types, utility functions, and upsert procedures for documents and content in both the database schema and TypeScript types. It adds unique constraints for embeddings, comprehensive SQL upsert logic for handling inline/local references, and detailed documentation with TypeScript usage examples. Type definitions and schema files are updated accordingly.

Changes

File(s) Change Summary
packages/database/doc/upsert_content.md Added documentation with TypeScript usage examples for upsert_content and upsert_documents RPC functions.
packages/database/input_types.ts Introduced TypeScript types for local document and content input structures with inline overrides.
packages/database/schema.yaml Added unique key constraint unique_target_model on ContentEmbedding for the target slot.
packages/database/supabase/migrations/20250606202159_content_upsert_function.sql Migration: Created unique index, composite types, and upsert functions for documents, content, and embeddings.
packages/database/supabase/schemas/content.sql Defined composite types and PL/pgSQL functions for upserting documents/content with local/inline references and embeddings.
packages/database/supabase/schemas/embedding.sql Added a unique index on target_id for the ContentEmbedding_openai_text_embedding_3_small_1536 table.
packages/database/types.gen.ts Extended generated types with new function signatures and composite types for upsert/content/document/embedding operations.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant SupabaseRPC
    participant DB

    Client->>SupabaseRPC: Call upsert_documents(space_id, docs_json)
    SupabaseRPC->>DB: For each document:\n- Resolve inline author/platform\n- Insert/update Document
    DB-->>SupabaseRPC: Return Document IDs
    SupabaseRPC-->>Client: Document IDs

    Client->>SupabaseRPC: Call upsert_content(space_id, content_json, creator_id)
    SupabaseRPC->>DB: For each content:\n- Resolve inline document/author/creator\n- Insert/update Content\n- Upsert embedding if present
    DB-->>SupabaseRPC: Return Content IDs
    SupabaseRPC-->>Client: Content IDs
Loading

Suggested reviewers

  • mdroidian

Poem

A hop, a skip, a schema grows,
With upserts swift and types that glow.
Embeddings nest with unique delight,
While docs and content sync just right.
From rabbit paws, this code takes flight—
May your migrations run smooth tonight!
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@maparent
Copy link
Collaborator Author

maparent commented Jun 9, 2025

I had to delete and recreate this branch, because GitHub deleted it when the old point of comparison was deleted.
@mdroidian you approved the earlier one, but I'm not sure if that was intentional or a point of confusion with the work in 417.
Anyway this is worth conceptual review before more coderabbit review.

@mdroidian
Copy link
Contributor

mdroidian commented Jun 10, 2025

I had to delete and recreate this branch, because GitHub deleted it when the old point of comparison was deleted. @mdroidian you approved the earlier one, but I'm not sure if that was intentional or a point of confusion with the work in 417. Anyway this is worth conceptual review before more coderabbit review.

Probably is good. But this will depend on what shows up after the scoping.

As an aside, the comments on the RPC are helpful, but I'm still not satisfied with the DX. RPC's feel buried and not easy to understand what is required.

For instance, in, take upsert_content for example. Without the comment, I'd have no idea what data to put in. But even with the comment, I don't know what fields are required. And using the function, anything will pass:
image

Maybe we need something like a helper file that has zod validation/wrappers

// rpc-contracts.ts
import { z } from "zod";

export const upsertContentArgs = z.object({
  space_id: z.number(),
  data: z.any(), // fill this out
  creator_id: z.number(),
  content_as_document: z.boolean().optional(),
});

export type UpsertContentArgs = z.infer<typeof upsertContentArgs>;
export type UpsertContentResult = number[];
// lib/rpc/upsertContent.ts
import { supabase } from "@/lib/supabaseClient";
import { upsertContentArgs, UpsertContentArgs, UpsertContentResult } from "./rpc";

// can add more comments here describing use
export const upsertContent = async (args: UpsertContentArgs): Promise<UpsertContentResult> => {
  const parsed = upsertContentArgs.parse(args);
  const { data, error } = await supabase.rpc("upsert_content", parsed);

  if (error) throw error;
  return data;
};
}

@maparent
Copy link
Collaborator Author

@CodeRabbit review

Copy link
Contributor

coderabbitai bot commented Jun 13, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🔭 Outside diff range comments (1)
packages/database/types.gen.ts (1)

291-303: ⚠️ Potential issue

Type mismatch – vector is string here but number[] elsewhere

Rows & inserts for ContentEmbedding_openai_text_embedding_3_small_1536 expose vector: string, yet
inline_embedding_input.vector and the upsert_content_embedding RPC expect number[].

This inconsistency forces consumers to as unknown as string cast, defeating type-safety.

If Supabase code-gen can’t map extensions.vector to number[], create a manual override:

export type Vector = number[];
// … patch generated types via module-augmentation

or change the RPC to accept string.

♻️ Duplicate comments (2)
packages/database/supabase/schemas/content.sql (2)

306-353: Mirror the contents update fix in upsert_documents
The snapshot function in the schema omits updating contents on conflict, just like in the migration.


381-483: Unused v_creator_id and content_as_document here too
As noted in the migration script, these parameters are never used. Adjust or remove.

🧹 Nitpick comments (8)
packages/database/doc/upsert_content.md (4)

1-1: Remove trailing full-stop in the heading

Markdown-lint (MD026) flags this; keeping headings punctuation-free prevents duplicated punctuation in ToC generators.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

1-1: Trailing punctuation in heading
Punctuation: '.'

(MD026, no-trailing-punctuation)


5-7: Import statement should not carry a file-extension

typescript/ts-node will happily accept it, but most bundlers (Vite, Next, etc.) expect extension-less bare-specifiers so that they can rewrite to .js.

-import type { LocalDocumentDataInput, LocalContentDataInput } from '@repo/database/input_types.ts';
+import type { LocalDocumentDataInput, LocalContentDataInput } from '@repo/database/input_types';

30-32: Example vector is unreadable – shorten or elide

The 1 536-dimension array overwhelms the doc and causes VS Code to choke on formatting. Provide 3-4 sample values followed by /* … */ so readers understand the shape without scrolling a thousand lines.


35-47: Code sample omits an async context / error handling

await client.rpc must run inside an async function or top-level for await (Deno). Wrap both examples or add a note; otherwise newcomers will copy-paste into plain scripts and hit “await is only valid in async functions”.

packages/database/input_types.ts (2)

1-2: Path portability – prefer extension-less import

Same argument as in docs; ersatz bundlers dislike .ts in bare specifiers.

-import { Database, TablesInsert } from "./types.gen.ts";
+import { Database, TablesInsert } from "./types.gen";

3-5: Partial<…> swallows required columns – deliberate?

Wrapping the intersected type in Partial makes every slot optional, including source_local_id, created, etc.
If some fields are genuinely mandatory (as enforced in SQL), keep them outside the Partial:

export type LocalDocumentDataInput =
  Omit<Database['public']['CompositeTypes']['document_local_input'], 'author_inline'> & {
    // still optional
    author_inline?: Partial<TablesInsert<'PlatformAccount'>>
  };
packages/database/supabase/migrations/20250606202159_content_upsert_function.sql (1)

1-2: Make index creation idempotent
Consider adding IF NOT EXISTS to CREATE UNIQUE INDEX to avoid failures on repeated runs in environments where the index may already exist.

packages/database/supabase/schemas/content.sql (1)

122-124: Re-evaluate NULLS DISTINCT on content index
Same as above—ensure your Postgres version supports NULLS DISTINCT or prefer a WHERE source_local_id IS NOT NULL clause for clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb2b6d9 and 32b241c.

📒 Files selected for processing (7)
  • packages/database/doc/upsert_content.md (1 hunks)
  • packages/database/input_types.ts (1 hunks)
  • packages/database/schema.yaml (1 hunks)
  • packages/database/supabase/migrations/20250606202159_content_upsert_function.sql (1 hunks)
  • packages/database/supabase/schemas/content.sql (2 hunks)
  • packages/database/supabase/schemas/embedding.sql (1 hunks)
  • packages/database/types.gen.ts (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/database/input_types.ts (1)
packages/database/types.gen.ts (2)
  • Database (9-747)
  • TablesInsert (778-799)
🪛 markdownlint-cli2 (0.17.2)
packages/database/doc/upsert_content.md

1-1: Trailing punctuation in heading
Punctuation: '.'

(MD026, no-trailing-punctuation)

🔇 Additional comments (6)
packages/database/input_types.ts (1)

6-12: document_inline should probably be optional

LocalContentDataInput currently allows content objects that must embed a full document_inline, even when document_local_id is supplied. Prefix the field with ? so callers can choose either linkage strategy.

-    document_inline: LocalDocumentDataInput,
+    document_inline?: LocalDocumentDataInput,
packages/database/types.gen.ts (1)

696-744: Composite types still contain nullable scalar fields – enforce at least one locator

content_local_input allows all of document_id, document_local_id, and document_inline to be null simultaneously, making it impossible for the server to resolve the target document.

Consider a check constraint in the PL/pgSQL wrapper or split the type into “by-id” vs “inline” variants.

packages/database/supabase/migrations/20250606202159_content_upsert_function.sql (3)

3-50: New composite input types look solid
The document_local_input, inline_embedding_input, and content_local_input types cleanly encapsulate both DB columns and local/inline references.


122-152: upsert_platform_account_input is well-implemented
The use of ON CONFLICT with COALESCE ensures safe upserts of platform accounts and returns the correct id.


208-227: Content embedding upsert is correct
The warning path prevents failures on invalid embeddings, and the ON CONFLICT branch properly refreshes vector and obsolete.

packages/database/supabase/schemas/content.sql (1)

45-46: Check PG version support for NULLS DISTINCT
NULLS DISTINCT in unique indexes requires PG 15+. Verify compatibility or replace with a partial index:

CREATE UNIQUE INDEX … ON public."Document"(space_id, source_local_id)
WHERE source_local_id IS NOT NULL;

Copy link
Contributor

@mdroidian mdroidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Would love some Zod tho, re: this comment 😁

@@ -0,0 +1,13 @@
import { Database, TablesInsert } from "./types.gen.ts";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Typescript files are generally camelCase. If you change it, also update the .md file that references it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, missed that before merging. I'll make a small PR for this.

@maparent
Copy link
Collaborator Author

I agree validation would help. Typing is only helping up to a point. This is what we wanted to help with drizzle, later. I'm still -1 on introducing zod before drizzle is there, as it's wasted effort.

@maparent maparent force-pushed the ENG-415-content-upsert-function branch from 1b56e75 to a20324f Compare June 23, 2025 13:41
@maparent maparent merged commit 4012930 into main Jun 23, 2025
2 of 3 checks passed
@github-project-automation github-project-automation bot moved this to Done in General Jun 23, 2025
@maparent maparent deleted the ENG-415-content-upsert-function branch June 23, 2025 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants