-
Notifications
You must be signed in to change notification settings - Fork 27
Partition token_balances #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Caution Review failedThe pull request is closed. WalkthroughUpdated ClickHouse DDL: added Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (2)
65-65
: Confirm timezone for toStartOfQuarter to avoid boundary skew.If block_timestamp is not guaranteed UTC at write time, quarters can shift by server TZ. Consider normalizing explicitly:
-PARTITION BY (chain_id, toStartOfQuarter(block_timestamp)) +PARTITION BY (chain_id, toStartOfQuarter(toTimeZone(block_timestamp, 'UTC')))Alternatively, document that block_timestamp is stored in UTC and server timezones are aligned.
65-65
: Leverage new time-based partitions with TTL/drop guidance.With quarterly partitions, retention and operational tasks become cheaper. Define a TTL or document a partition-drop SOP (e.g., keep 8 quarters per chain) to cap storage.
Example:
ALTER TABLE token_balances MODIFY TTL block_timestamp + INTERVAL 2 YEAR TO DISK 'cold' -- or DELETE;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.
📚 Learning: 2025-07-22T18:30:59.800Z
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.
Applied to files:
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql
) | ||
ENGINE = ReplacingMergeTree(insert_timestamp, is_deleted) | ||
PARTITION BY chain_id | ||
PARTITION BY (chain_id, toStartOfQuarter(block_timestamp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partition change won’t apply to existing deployments; add an explicit migration plan.
This file uses CREATE TABLE IF NOT EXISTS, so clusters that already have token_balances (partitioned by chain_id) will not be altered. You’ll ship a schema drift where fresh installs get (chain_id, quarter) while existing ones stay on chain_id-only.
Recommended migration (outline):
- Create a v2 table with the new partition key.
- Pause or dual-write ingestion.
- Backfill via INSERT … SELECT.
- Atomically swap tables; then drop the old one.
- Do this ON CLUSTER if applicable.
Example SQL (adapt to your deploy tooling and cluster topology):
-- 1) Create new table
CREATE TABLE token_balances_v2
(
-- same columns, indexes, projections as current token_balances
) ENGINE = ReplacingMergeTree(insert_timestamp, is_deleted)
PARTITION BY (chain_id, toStartOfQuarter(block_timestamp))
ORDER BY (chain_id, owner_address, token_address, token_id, block_number, transaction_index, log_index, direction)
SETTINGS index_granularity = 8192,
lightweight_mutation_projection_mode = 'rebuild',
deduplicate_merge_projection_mode = 'rebuild',
allow_part_offset_column_in_projections = 1;
-- 2) Backfill
INSERT INTO token_balances_v2 SELECT * FROM token_balances;
-- 3) Swap (atomic)
RENAME TABLE token_balances TO token_balances_old, token_balances_v2 TO token_balances;
-- 4) Resume writes; validate; then drop old
DROP TABLE token_balances_old;
If this file is intended only for fresh installs, add a separate numbered migration that performs the above for upgrades to avoid silent divergence.
🤖 Prompt for AI Agents
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql around line
65: the CREATE TABLE IF NOT EXISTS change adds a new PARTITION BY (chain_id,
toStartOfQuarter(block_timestamp)) but will not alter existing deployments that
already have token_balances partitioned only by chain_id, causing schema drift;
add an explicit migration that creates a token_balances_v2 with the new
partition key (matching all columns, indexes, projections and engine settings),
pause or dual-write ingestion if needed, backfill data via INSERT … SELECT to
token_balances_v2, perform an atomic swap (RENAME TABLE token_balances TO
token_balances_old, token_balances_v2 TO token_balances) and then resume writes
and validate before dropping token_balances_old, and ensure the migration runs
ON CLUSTER where applicable or provide this as an upgrade-only numbered
migration separate from the fresh-install DDL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (1)
66-66
: Partition change won’t apply in-place; ship an explicit v2 + backfill migration.Existing tables stay partitioned by chain_id only. You’ll get schema drift between fresh installs and upgrades unless you migrate.
Minimal plan (adapt/ON CLUSTER as needed):
-- 1) Create v2 with the new partitioning CREATE TABLE token_balances_v2 AS token_balances ENGINE = ReplacingMergeTree(insert_timestamp, is_deleted) PARTITION BY (chain_id, toStartOfQuarter(block_timestamp)) ORDER BY (chain_id, owner_address, token_address, token_id, block_number, transaction_index, log_index, direction) SETTINGS index_granularity = 8192, lightweight_mutation_projection_mode = 'rebuild', deduplicate_merge_projection_mode = 'rebuild', allow_part_offset_column_in_projections = 1; -- 2) Backfill INSERT INTO token_balances_v2 SELECT * FROM token_balances; -- 3) Atomic swap RENAME TABLE token_balances TO token_balances_old, token_balances_v2 TO token_balances; -- 4) Validate then drop old DROP TABLE token_balances_old;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql
(1 hunks)internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql
(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.
📚 Learning: 2025-07-22T18:30:59.800Z
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.
Applied to files:
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql
`insert_timestamp` DateTime DEFAULT now(), | ||
`is_deleted` UInt8 DEFAULT 0, | ||
|
||
INDEX idx_block_number block_number TYPE minmax GRANULARITY 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an upgrade migration to materialize the new index on existing clusters.
CREATE TABLE IF NOT EXISTS won’t retrofit this index; existing deployments will miss idx_block_number unless you run ALTER + MATERIALIZE.
Suggested numbered migration (use ON CLUSTER if applicable):
-- add index
ALTER TABLE token_transfers ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1;
-- build index for historical parts
ALTER TABLE token_transfers MATERIALIZE INDEX idx_block_number;
🤖 Prompt for AI Agents
internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql around line
20: the new idx_block_number index will not be retrofitted into existing
clusters by CREATE TABLE IF NOT EXISTS, so add an upgrade migration that runs an
ALTER TABLE to add the index if missing and then MATERIALIZE the index for
historical parts; the migration should use "ALTER TABLE ... ADD INDEX IF NOT
EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1" and then "ALTER
TABLE ... MATERIALIZE INDEX idx_block_number", and include ON CLUSTER
<cluster_name> if this repo deploys ClickHouse clusters.
`insert_timestamp` DateTime DEFAULT now(), | ||
`is_deleted` UInt8 DEFAULT 0, | ||
|
||
INDEX idx_block_number block_number TYPE minmax GRANULARITY 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Also add an upgrade migration for the new block_number index.
Same issue: CREATE IF NOT EXISTS won’t add idx_block_number to existing tables; add ALTER + MATERIALIZE.
ALTER TABLE token_balances ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1;
ALTER TABLE token_balances MATERIALIZE INDEX idx_block_number;
Optional: keep an eye on storage overhead; minmax with GRANULARITY 1 is usually cheap and helps when queries filter by block_number without token/owner predicates.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "PWD: $(pwd)"
echo
echo "== Show file internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (if present) =="
if [ -f internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql ]; then
sed -n '1,240p' internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql || true
else
echo "MISSING: internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql"
fi
echo
echo "== Search for idx_block_number / token_balances / MATERIALIZE INDEX / ALTER TABLE occurrences =="
rg -n --hidden --no-ignore -S "idx_block_number|MATERIALIZE INDEX idx_block_number|MATERIALIZE INDEX|ALTER TABLE token_balances|ADD INDEX idx_block_number|ADD INDEX IF NOT EXISTS idx_block_number|token_balances" || true
echo
echo "== List files in internal/tools/clickhouse (if dir exists) =="
if [ -d internal/tools/clickhouse ]; then
ls -1 internal/tools/clickhouse | sed -n '1,200p' || true
else
echo "DIR NOT FOUND: internal/tools/clickhouse"
fi
Length of output: 5878
Add ALTER migration to materialize idx_block_number on existing tables
CREATE TABLE IF NOT EXISTS does not add new indexes to pre-existing tables — add an ALTER + MATERIALIZE migration for token_balances (and token_transfers if already deployed).
Locations:
- internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (idx_block_number at line 23)
- internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql (same index)
Suggested SQL:
ALTER TABLE token_balances ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1;
ALTER TABLE token_balances MATERIALIZE INDEX idx_block_number;
ALTER TABLE token_transfers ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1;
ALTER TABLE token_transfers MATERIALIZE INDEX idx_block_number;
🤖 Prompt for AI Agents
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql around line
23: CREATE TABLE IF NOT EXISTS does not materialize new indexes on existing
tables, so add an ALTER migration that adds and materializes idx_block_number
for token_balances (and do the same for token_transfers if deployed). Create a
new migration SQL (or append to existing migration series) that runs ALTER TABLE
token_balances ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax
GRANULARITY 1; followed by ALTER TABLE token_balances MATERIALIZE INDEX
idx_block_number; and likewise for token_transfers (ADD INDEX IF NOT EXISTS ...
and MATERIALIZE INDEX ...) to ensure the index is applied to pre-existing
tables.
Summary by CodeRabbit