-
Notifications
You must be signed in to change notification settings - Fork 2
ENG-950 Refactor the concept query function with new security-scoped views to avoid RLS slowdown #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENG-950 Refactor the concept query function with new security-scoped views to avoid RLS slowdown #482
Conversation
Updates to Preview Branch (eng-950-optimize-data-access-following-the-simulation) ↗︎
Tasks are run on every commit but only new migration files are pushed.
View logs for this Workflow Run ↗︎. |
@CodeRabbit review |
✅ Actions performedReview triggered.
|
9dc305e
to
8b79fa1
Compare
@CodeRabbit review again, I force-pushed and changed the base. |
@maparent: I'll perform a full review of the PR now that you've force-pushed and changed the base branch. ✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughAdds privacy-scoped my_* views/functions and accompanying FK/relationship typings; updates queries/types to target those views; implements RLS helper functions (is_my_account, my_account(), my_space_ids(), in_space, etc.); centralizes embedding view; improves error reporting in space creation and signs out anonymous user after sign-in. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant EdgeFn as create-space Edge Function
participant Auth as Supabase Auth
participant DB as DB (RLS + my_* helpers)
Client->>EdgeFn: POST /create-space (credentials)
EdgeFn->>Auth: signInWithPassword()
Auth-->>EdgeFn: session (password user)
rect rgb(220,240,255)
note right of EdgeFn: create or fetch space via privacy views/functions
EdgeFn->>DB: call my_account()/my_space_ids()/insert Space / SpaceAccess
DB-->>EdgeFn: success or error
end
alt success
EdgeFn->>Auth: signOut({ scope: "local" })
EdgeFn-->>Client: 200 Created (space info)
else failure
EdgeFn->>DB: read error context body (if any)
EdgeFn-->>Client: aggregated error message
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
8b79fa1
to
1f6d684
Compare
@CodeRabbit is this title more clear? |
Tip For best results, initiate chat on the files or code changes. @maparent: The current title "ENG-950 Optimize data access following the simulation" is partially clear, but the phrase "following the simulation" is ambiguous. It's unclear whether this means:
Based on the changes in this PR (security views, API renaming, expanded query capabilities), a clearer title might be: Option 1: If "simulation" is important context from the linked issue, you could clarify it: Which aspect of the changes would you like to emphasize in the title? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
packages/database/src/lib/contextFunctions.ts (3)
106-107
: Add error handling for Response body parsing.The
new Response(...).text()
operation can throw ifresult2.error.context.body
is not a validBodyInit
type. Consider wrapping this in a try-catch to prevent the error handler from itself throwing an error.Apply this diff to add error handling:
- if (result2.error?.context?.body) - error += await new Response(result2.error.context.body).text(); + if (result2.error?.context?.body) { + try { + error += await new Response(result2.error.context.body).text(); + } catch (e) { + error += " [Failed to parse error context body]"; + } + }
108-108
: Consider removing console.log in production.The console.log might expose sensitive error details in production environments. Consider using a proper logging framework with appropriate log levels, or guard this with an environment check.
105-105
: Redundant type cast.The cast
as string | undefined
is redundant sinceresult2.error?.message
already producesstring | undefined
through optional chaining.Apply this diff to simplify:
- let error: string = (result2.error?.message as string | undefined) || ""; + let error: string = result2.error?.message || "";packages/database/supabase/functions/create-space/index.ts (1)
117-117
: LGTM! Proper session cleanup.The
signOut({ scope: "local" })
call correctly cleans up the session created by the sign-in attempt used to check anonymous user existence. Thescope: "local"
parameter appropriately limits the sign-out to the current session without affecting other sessions.Note: The signOut is called even when sign-in fails with
invalid_credentials
, but this is harmless as Supabase's signOut safely handles the absence of an active session.Optional: Consider using admin API for user existence checks.
For better idiomaticity, consider refactoring to use
supabase.auth.admin.listUsers({ filter: { email } })
or a similar admin method instead of signing in to check user existence. This would:
- Avoid creating unnecessary sessions
- Be more explicit about the intent
- Eliminate the need for immediate sign-out
However, this refactor can be deferred as the current implementation is functional and the signOut addition properly addresses the session cleanup concern.
packages/database/supabase/schemas/account.sql (1)
189-193
: Optional: LIMIT 1 in CTEs is redundant.The
WITH u AS (SELECT auth.uid() LIMIT 1)
pattern includes an unnecessaryLIMIT 1
sinceauth.uid()
returns a single value. While not harmful, it adds noise.Consider simplifying:
- WITH u AS (SELECT auth.uid() LIMIT 1) + WITH u AS (SELECT auth.uid())Also applies to: 202-205, 214-218
packages/database/src/lib/queries.ts (1)
153-153
: Security-scoped queries via my_concepts view.Switching from the
Concept
table to themy_concepts
view ensures queries automatically respect user access permissions through the view'smy_space_ids()
filter.Performance consideration: Views may impact query planning. If performance degrades, consider:
- Ensuring indexes on
Concept.space_id
exist- Monitoring query execution plans
- Potentially materializing views for read-heavy workloads
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
packages/database/src/dbTypes.ts
(11 hunks)packages/database/src/lib/contextFunctions.ts
(1 hunks)packages/database/src/lib/queries.ts
(4 hunks)packages/database/supabase/functions/create-space/index.ts
(2 hunks)packages/database/supabase/migrations/20251008131345_security_views.sql
(1 hunks)packages/database/supabase/schemas/access_token.sql
(1 hunks)packages/database/supabase/schemas/account.sql
(4 hunks)packages/database/supabase/schemas/concept.sql
(5 hunks)packages/database/supabase/schemas/content.sql
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Supabase Preview
- GitHub Check: Supabase Preview
- GitHub Check: Supabase Preview
🔇 Additional comments (13)
packages/database/supabase/functions/create-space/index.ts (2)
197-210
: LGTM! Improved code organization.Moving the origin validation outside the OPTIONS block improves code organization by allowing the
originIsAllowed
variable to be reused for both the OPTIONS response and the success response at line 238. This reduces potential duplication and makes the flow clearer.
189-189
: LGTM! Style improvement.Adding the semicolon improves consistency with the codebase's style conventions.
packages/database/src/dbTypes.ts (1)
561-812
: LGTM - Auto-generated types align with schema changes.The new
my_concepts
,my_contents
, andmy_documents
view definitions are correctly generated from the SQL schema, with appropriate nullable fields and relationship definitions.packages/database/supabase/schemas/concept.sql (2)
149-240
: Function overloading pattern is appropriate.The overloaded functions for
my_concepts
view (e.g.,schema_of_concept
,content_of_concept
,author_of_concept
) provide a clean API for both direct table access and view-filtered access, maintaining backward compatibility while enabling security-scoped queries.
118-135
: No changes needed: null my_space_ids() yields empty result as intendedpackages/database/supabase/schemas/account.sql (1)
267-294
: LGTM - Security views properly implement access control.The
my_spaces
andmy_accounts
views correctly filter by user's accessible spaces usingmy_space_ids()
, providing a clean security boundary for client queries.packages/database/supabase/schemas/access_token.sql (1)
43-43
: LGTM - Policy correctly updated to use is_my_account.The change from
my_account(platform_account_id)
tois_my_account(platform_account_id)
is semantically clearer and consistent with the account.sql refactoring.packages/database/src/lib/queries.ts (2)
117-121
: Correct use of PostgREST function relationship syntax.The changes to
Document:document_of_content
andContent:content_of_concept
properly leverage PostgREST's computed relationship feature, calling the SQL functions that return related rows.
177-177
: Case change aligns with PostgREST embedded resource naming.The change from
content.source_local_id
toContent.source_local_id
(capital C) is consistent with PostgREST's naming convention where embedded resources use the relationship name defined in the query (line 121:Content:content_of_concept
).packages/database/supabase/schemas/content.sql (2)
73-84
: LGTM - Views follow consistent security pattern.The
my_documents
andmy_contents
views correctly filter bymy_space_ids()
, maintaining consistency with the security model introduced for concepts and accounts.Note: These views are subject to the same NULL handling issue as
my_concepts
- ensuremy_space_ids()
returns an empty array instead of NULL when the user has no space access (see comment on account.sql line 209-219).Also applies to: 177-192
194-214
: Relationship functions properly implement one-to-one semantics.The
document_of_content
andauthor_of_content
functions correctly useROWS 1
andSTRICT STABLE
attributes, providing efficient computed relationships for PostgREST queries.packages/database/supabase/migrations/20251008131345_security_views.sql (2)
97-255
: Migration structure is well-organized and idempotent.The migration properly uses:
CREATE OR REPLACE
for functions and viewsDROP POLICY IF EXISTS
before creating policies- Descriptive comments on functions
This ensures the migration can be safely re-applied if needed.
1-1
: No external references to oldmy_account()
All found calls are within schema definitions or migrations that this script renames or recreates. Migration handles every internal usage.
@CodeRabbit: views were introduced precisely because performance was dismal without them. They do impact performance, but in the right direction. |
Tip For best results, initiate chat on the files or code changes. @maparent: Thank you for the clarification! I appreciate you explaining the performance motivation behind these changes. It's valuable to know that:
This context helps understand why the PR shifts from direct table queries to view-based access patterns. The security benefits align nicely with the performance improvements. 🐰✨ ✏️ Learnings added
|
1ee631d
to
129ace9
Compare
129ace9
to
69d76f7
Compare
Deciding to wait for review, there will be iterations before it's pushed. |
ac99941
to
71e4ccf
Compare
dd2c2d1
to
028406b
Compare
028406b
to
d567640
Compare
386276f
to
d7a302c
Compare
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
packages/database/supabase/schemas/embedding.sql (2)
31-51
: Scoped embedding view LGTMJoins and my_space_ids scoping look correct; filtering out obsolete embeddings is good.
Minor: ensure an appropriate pgvector index (e.g., ivfflat on ContentEmbedding_openai_text_embedding_3_small_1536.vector) exists and is tuned for your K and probes to keep ORDER BY <=> LIMIT performant.
71-77
: Threshold + ANN ordering may reduce index usageThe WHERE similarity > match_threshold can limit ANN index effectiveness. If you observe slow plans, consider:
- relying on ORDER BY <=> LIMIT only, and applying the threshold in the caller; or
- increasing LIMIT internally (e.g., 2–4x) then filtering by threshold.
This is optional and workload‑dependent.
Also applies to: 72-74, 76-76
packages/database/supabase/migrations/20251008131345_security_views.sql (2)
59-66
: Prefer EXISTS over COUNT(...) > 0 in boolean helpersUsing EXISTS avoids full counts and can short‑circuit:
-CREATE OR REPLACE FUNCTION public.account_in_shared_space(p_account_id BIGINT) RETURNS boolean +CREATE OR REPLACE FUNCTION public.account_in_shared_space(p_account_id BIGINT) RETURNS boolean STABLE SECURITY DEFINER SET search_path = '' LANGUAGE sql AS $$ - SELECT count(sa.account_id) > 0 FROM public."SpaceAccess" AS sa - WHERE sa.account_id = p_account_id - AND sa.space_id = ANY(public.my_space_ids()); + SELECT EXISTS ( + SELECT 1 + FROM public."SpaceAccess" AS sa + WHERE sa.account_id = p_account_id + AND sa.space_id = ANY(public.my_space_ids()) + ); $$; -CREATE OR REPLACE FUNCTION public.unowned_account_in_shared_space(p_account_id BIGINT) RETURNS boolean +CREATE OR REPLACE FUNCTION public.unowned_account_in_shared_space(p_account_id BIGINT) RETURNS boolean STABLE SECURITY DEFINER SET search_path = '' LANGUAGE sql AS $$ - SELECT count(sa.account_id) > 0 FROM public."SpaceAccess" AS sa - JOIN public."PlatformAccount" AS pa ON (pa.id = sa.account_id) - WHERE sa.account_id = p_account_id - AND sa.space_id = ANY(public.my_space_ids()) - AND pa.dg_account IS NULL; + SELECT EXISTS ( + SELECT 1 + FROM public."SpaceAccess" AS sa + JOIN public."PlatformAccount" AS pa ON (pa.id = sa.account_id) + WHERE sa.account_id = p_account_id + AND sa.space_id = ANY(public.my_space_ids()) + AND pa.dg_account IS NULL + ); $$;Also applies to: 70-79
151-163
: Duplication with schema files — centralize canonical definitionsThese views/functions are also defined in schemas (content.sql, concept.sql, embedding.sql). Duplicated CREATE OR REPLACE across migrations and schema snapshots increases drift risk.
Action: keep canonical definitions in one place (prefer schemas); in migrations, only add transitional statements needed for live DB upgrades.
Also applies to: 164-180, 181-199, 201-286, 287-307, 309-363
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
packages/database/package.json
(1 hunks)packages/database/src/dbTypes.ts
(18 hunks)packages/database/src/lib/contextFunctions.ts
(1 hunks)packages/database/src/lib/queries.ts
(3 hunks)packages/database/supabase/functions/create-space/index.ts
(2 hunks)packages/database/supabase/migrations/20251008131345_security_views.sql
(1 hunks)packages/database/supabase/schemas/access_token.sql
(1 hunks)packages/database/supabase/schemas/account.sql
(4 hunks)packages/database/supabase/schemas/concept.sql
(5 hunks)packages/database/supabase/schemas/content.sql
(2 hunks)packages/database/supabase/schemas/embedding.sql
(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-30T14:49:24.016Z
Learnt from: maparent
PR: DiscourseGraphs/discourse-graph#182
File: apps/website/app/utils/supabase/dbUtils.ts:22-28
Timestamp: 2025-05-30T14:49:24.016Z
Learning: In apps/website/app/utils/supabase/dbUtils.ts, expanding the KNOWN_EMBEDDINGS and DEFAULT_DIMENSIONS mappings to support additional embedding models requires corresponding database model changes (creating new embedding tables), which should be scoped as separate work from API route implementations.
Applied to files:
packages/database/supabase/schemas/embedding.sql
📚 Learning: 2025-05-20T14:04:19.632Z
Learnt from: maparent
PR: DiscourseGraphs/discourse-graph#165
File: packages/database/supabase/schemas/embedding.sql:66-95
Timestamp: 2025-05-20T14:04:19.632Z
Learning: In the `match_embeddings_for_subset_nodes` SQL function in packages/database/supabase/schemas/embedding.sql, the number of results is implicitly limited by the length of the input array parameter `p_subset_roam_uids` since the function filters content using `WHERE c.source_local_id = ANY(p_subset_roam_uids)`.
Applied to files:
packages/database/supabase/schemas/embedding.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Supabase Preview
- GitHub Check: Supabase Preview
🔇 Additional comments (16)
packages/database/supabase/functions/create-space/index.ts (3)
117-117
: Good addition: Clean session management.Signing out the anonymous session immediately after the sign-in check ensures clean session state before proceeding with admin operations. Since the function uses the service role key for subsequent operations, the anonymous session is not needed.
Based on learnings: This aligns with the broader shift toward explicit session-state management in authentication flows.
189-189
: Minor: Formatting-only change.The semicolon placement change is stylistic and doesn't affect functionality.
197-210
: LGTM: OPTIONS handling reformatted.The preflight request handling logic remains identical, just reformatted for clarity. The origin validation and CORS headers are unchanged.
packages/database/src/lib/queries.ts (5)
117-122
: LGTM: Systematic refactor to security-scoped views.The query composition correctly shifts from direct table references to the new
my_*
views:
Document:my_documents!document_id
(line 118-119)Content:my_contents!represented_by_id
(line 121)The foreign key references and
!inner
modifiers are preserved correctly.Based on learnings: These security views significantly improve query performance versus direct table access with RLS.
124-124
: LGTM: Author reference updated to my_accounts view.The author alias now correctly references
my_accounts!author_id!inner
, consistent with the broader refactor to security-scoped views.
144-150
: LGTM: Relation subnodes updated to use my_ views.*The nested relation references are correctly updated:
Content:my_contents!represented_by_id!inner
(lines 144-146)author:my_accounts!author_id!inner
(line 149)These maintain the same query semantics while using the security-scoped views.
155-155
: Critical change: Base table switched to my_concepts view.This is the core of the refactor - all concept queries now target the
my_concepts
view instead of theConcept
table, which improves performance by avoiding RLS evaluation on the base table.Based on learnings: Security views were introduced specifically because direct table access had dismal performance, and these views improve performance significantly.
179-179
: LGTM: Filter path updated to match view alias.The filter path
Content.source_local_id
correctly references theContent
alias created at line 121 (Content:my_contents!represented_by_id
). The capitalization is intentional and matches the alias naming convention.packages/database/src/lib/contextFunctions.ts (1)
105-115
: Verifyresult2.error.context.body
type
Wrapping it innew Response(...).text()
implies it may not be a string. If it’s already a string, append directly (error += result2.error.context.body;
); otherwise, the wrapper is needed. Confirm its type and simplify accordingly.packages/database/supabase/schemas/concept.sql (2)
118-135
: my_concepts depends on my_space_ids — ensure create-orderThis view requires public.my_space_ids() to exist at creation time. Please confirm migration order guarantees the function is defined before this schema file runs (or move the view into the same migration that defines my_space_ids()).
149-159
: Co-locate composite-type views and dependent functions
Functions using thepublic.my_concepts
,public.my_contents
, andpublic.my_accounts
composite types must be created after those views exist. Either move eachCREATE FUNCTION …
block immediately after its correspondingCREATE VIEW
in the same file or enforce your migration ordering to guarantee the views are in place first.packages/database/supabase/schemas/embedding.sql (1)
102-104
: Subset function refactor LGTMSwitch to the scoped view and selecting the vector column directly looks correct.
packages/database/supabase/schemas/content.sql (3)
72-85
: my_documents view LGTMExplicit column list and my_space_ids scoping look good.
177-193
: my_contents view LGTMColumn selection and scoping align with usage in embedding queries.
194-215
: No ordering issue — views precede functions
Both public.my_documents (in content.sql line 73) and public.my_accounts (in account.sql) are defined before the functions that return their composite types.packages/database/supabase/migrations/20251008131345_security_views.sql (1)
31-41
: my_space_ids NULL handling fixed — goodUsing COALESCE on array_agg prevents NULL arrays from breaking ANY(...) checks. Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
https://linear.app/discourse-graphs/issue/ENG-950/optimize-data-access-following-the-simulation
Summary by CodeRabbit
New Features
Security
Improvements