Partition token_balances #291

jakeloo · 2025-09-13T16:29:14Z

Summary by CodeRabbit

Chores
- Optimized token balances partitioning to group by chain and calendar quarter for faster historical queries and large-range exports.
- Added block-number indexing for token balances and token transfers to speed block-based queries and range scans.
- Reorganized log query projections to improve filtering/sorting by address, topic, and block number, boosting query performance and scalability; no user-facing API or workflow changes.

coderabbitai · 2025-09-13T16:29:22Z

Caution

Review failed

The pull request is closed.

Walkthrough

Updated ClickHouse DDL: added idx_block_number minmax index to token_transfers and token_balances, changed token_balances partitioning to (chain_id, toStartOfQuarter(block_timestamp)), and reorganized projections in the logs table (added two new projections, removed one).

Changes

Cohort / File(s)	Summary
Token tables `internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql`, `internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql`	Added `INDEX idx_block_number block_number TYPE minmax GRANULARITY 1` to both tables; changed `token_balances` partitioning from `PARTITION BY chain_id` to `PARTITION BY (chain_id, toStartOfQuarter(block_timestamp))`. Other table definitions unchanged.
Logs table projections `internal/tools/clickhouse/0002_clickhouse_create_logs_table.sql`	Added `PROJECTION chain_address_block_number_full_projection (SELECT * ORDER BY chain_id, address, block_number)` and `PROJECTION chain_topic0_full_projection (SELECT * ORDER BY chain_id, topic_0, block_number, address)`; removed old `chain_topic0_projection`. `chain_address_topic0_projection` unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Partition token_balances #291 — modifies the same ClickHouse schema files, including the token_balances partition change and idx_block_number addition.

Suggested reviewers

joaquim-verges

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Partition token_balances" is concise and directly summarizes the primary change in the changeset, which updates the partitioning for the token_balances table. The provided raw summary shows the partition key was changed to (chain_id, toStartOfQuarter(block_timestamp)) and related index work, confirming the title targets the main modification. As a short, specific phrase it will be clear to teammates scanning the repository history.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 144585d and 7e4530f.

📒 Files selected for processing (1)

internal/tools/clickhouse/0002_clickhouse_create_logs_table.sql (1 hunks)

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (2)
65-65: Confirm timezone for toStartOfQuarter to avoid boundary skew.

If block_timestamp is not guaranteed UTC at write time, quarters can shift by server TZ. Consider normalizing explicitly:
-PARTITION BY (chain_id, toStartOfQuarter(block_timestamp))
+PARTITION BY (chain_id, toStartOfQuarter(toTimeZone(block_timestamp, 'UTC')))
Alternatively, document that block_timestamp is stored in UTC and server timezones are aligned.

65-65: Leverage new time-based partitions with TTL/drop guidance.

With quarterly partitions, retention and operational tasks become cheaper. Define a TTL or document a partition-drop SOP (e.g., keep 8 quarters per chain) to cap storage.

Example:
ALTER TABLE token_balances
  MODIFY TTL block_timestamp + INTERVAL 2 YEAR
  TO DISK 'cold' -- or DELETE;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f711248 and db48264.

📒 Files selected for processing (1)

internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.

📚 Learning: 2025-07-22T18:30:59.800Z

Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.

Applied to files:

internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql

coderabbitai · 2025-09-13T16:32:38Z

internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql

 )
 ENGINE = ReplacingMergeTree(insert_timestamp, is_deleted)
-PARTITION BY chain_id
+PARTITION BY (chain_id, toStartOfQuarter(block_timestamp))


⚠️ Potential issue

Partition change won’t apply to existing deployments; add an explicit migration plan.

This file uses CREATE TABLE IF NOT EXISTS, so clusters that already have token_balances (partitioned by chain_id) will not be altered. You’ll ship a schema drift where fresh installs get (chain_id, quarter) while existing ones stay on chain_id-only.

Recommended migration (outline):

Create a v2 table with the new partition key.

Pause or dual-write ingestion.

Backfill via INSERT … SELECT.

Atomically swap tables; then drop the old one.

Do this ON CLUSTER if applicable.

Example SQL (adapt to your deploy tooling and cluster topology):

-- 1) Create new table CREATE TABLE token_balances_v2 ( -- same columns, indexes, projections as current token_balances ) ENGINE = ReplacingMergeTree(insert_timestamp, is_deleted) PARTITION BY (chain_id, toStartOfQuarter(block_timestamp)) ORDER BY (chain_id, owner_address, token_address, token_id, block_number, transaction_index, log_index, direction) SETTINGS index_granularity = 8192, lightweight_mutation_projection_mode = 'rebuild', deduplicate_merge_projection_mode = 'rebuild', allow_part_offset_column_in_projections = 1; -- 2) Backfill INSERT INTO token_balances_v2 SELECT * FROM token_balances; -- 3) Swap (atomic) RENAME TABLE token_balances TO token_balances_old, token_balances_v2 TO token_balances; -- 4) Resume writes; validate; then drop old DROP TABLE token_balances_old;

If this file is intended only for fresh installs, add a separate numbered migration that performs the above for upgrades to avoid silent divergence.

🤖 Prompt for AI Agents

internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql around line 65: the CREATE TABLE IF NOT EXISTS change adds a new PARTITION BY (chain_id, toStartOfQuarter(block_timestamp)) but will not alter existing deployments that already have token_balances partitioned only by chain_id, causing schema drift; add an explicit migration that creates a token_balances_v2 with the new partition key (matching all columns, indexes, projections and engine settings), pause or dual-write ingestion if needed, backfill data via INSERT … SELECT to token_balances_v2, perform an atomic swap (RENAME TABLE token_balances TO token_balances_old, token_balances_v2 TO token_balances) and then resume writes and validate before dropping token_balances_old, and ensure the migration runs ON CLUSTER where applicable or provide this as an upgrade-only numbered migration separate from the fresh-install DDL.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (1)

66-66: Partition change won’t apply in-place; ship an explicit v2 + backfill migration.

Existing tables stay partitioned by chain_id only. You’ll get schema drift between fresh installs and upgrades unless you migrate.

Minimal plan (adapt/ON CLUSTER as needed):

-- 1) Create v2 with the new partitioning
CREATE TABLE token_balances_v2 AS token_balances
ENGINE = ReplacingMergeTree(insert_timestamp, is_deleted)
PARTITION BY (chain_id, toStartOfQuarter(block_timestamp))
ORDER BY (chain_id, owner_address, token_address, token_id, block_number, transaction_index, log_index, direction)
SETTINGS index_granularity = 8192,
         lightweight_mutation_projection_mode = 'rebuild',
         deduplicate_merge_projection_mode = 'rebuild',
         allow_part_offset_column_in_projections = 1;

-- 2) Backfill
INSERT INTO token_balances_v2 SELECT * FROM token_balances;

-- 3) Atomic swap
RENAME TABLE token_balances TO token_balances_old, token_balances_v2 TO token_balances;

-- 4) Validate then drop old
DROP TABLE token_balances_old;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between db48264 and 144585d.

📒 Files selected for processing (2)

internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql (1 hunks)
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (2 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.

📚 Learning: 2025-07-22T18:30:59.800Z

Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.

Applied to files:

internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql

coderabbitai · 2025-09-15T16:28:51Z

internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql

    `insert_timestamp` DateTime DEFAULT now(),
    `is_deleted` UInt8 DEFAULT 0,

+    INDEX idx_block_number block_number TYPE minmax GRANULARITY 1,


⚠️ Potential issue

Add an upgrade migration to materialize the new index on existing clusters.

CREATE TABLE IF NOT EXISTS won’t retrofit this index; existing deployments will miss idx_block_number unless you run ALTER + MATERIALIZE.

Suggested numbered migration (use ON CLUSTER if applicable):

-- add index ALTER TABLE token_transfers ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1; -- build index for historical parts ALTER TABLE token_transfers MATERIALIZE INDEX idx_block_number;

🤖 Prompt for AI Agents

internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql around line 20: the new idx_block_number index will not be retrofitted into existing clusters by CREATE TABLE IF NOT EXISTS, so add an upgrade migration that runs an ALTER TABLE to add the index if missing and then MATERIALIZE the index for historical parts; the migration should use "ALTER TABLE ... ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1" and then "ALTER TABLE ... MATERIALIZE INDEX idx_block_number", and include ON CLUSTER <cluster_name> if this repo deploys ClickHouse clusters.

coderabbitai · 2025-09-15T16:28:52Z

internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql

  `insert_timestamp` DateTime DEFAULT now(),
  `is_deleted` UInt8 DEFAULT 0,

+  INDEX idx_block_number block_number TYPE minmax GRANULARITY 1,


💡 Verification agent

🧩 Analysis chain

Also add an upgrade migration for the new block_number index.

Same issue: CREATE IF NOT EXISTS won’t add idx_block_number to existing tables; add ALTER + MATERIALIZE.

ALTER TABLE token_balances ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1; ALTER TABLE token_balances MATERIALIZE INDEX idx_block_number;

Optional: keep an eye on storage overhead; minmax with GRANULARITY 1 is usually cheap and helps when queries filter by block_number without token/owner predicates.

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "PWD: $(pwd)" echo echo "== Show file internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (if present) ==" if [ -f internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql ]; then sed -n '1,240p' internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql || true else echo "MISSING: internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql" fi echo echo "== Search for idx_block_number / token_balances / MATERIALIZE INDEX / ALTER TABLE occurrences ==" rg -n --hidden --no-ignore -S "idx_block_number|MATERIALIZE INDEX idx_block_number|MATERIALIZE INDEX|ALTER TABLE token_balances|ADD INDEX idx_block_number|ADD INDEX IF NOT EXISTS idx_block_number|token_balances" || true echo echo "== List files in internal/tools/clickhouse (if dir exists) ==" if [ -d internal/tools/clickhouse ]; then ls -1 internal/tools/clickhouse | sed -n '1,200p' || true else echo "DIR NOT FOUND: internal/tools/clickhouse" fi

Length of output: 5878

Add ALTER migration to materialize idx_block_number on existing tables

CREATE TABLE IF NOT EXISTS does not add new indexes to pre-existing tables — add an ALTER + MATERIALIZE migration for token_balances (and token_transfers if already deployed).

Locations:

internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (idx_block_number at line 23)

internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql (same index)

Suggested SQL:

ALTER TABLE token_balances ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1; ALTER TABLE token_balances MATERIALIZE INDEX idx_block_number; ALTER TABLE token_transfers ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1; ALTER TABLE token_transfers MATERIALIZE INDEX idx_block_number;

🤖 Prompt for AI Agents

internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql around line 23: CREATE TABLE IF NOT EXISTS does not materialize new indexes on existing tables, so add an ALTER migration that adds and materializes idx_block_number for token_balances (and do the same for token_transfers if deployed). Create a new migration SQL (or append to existing migration series) that runs ALTER TABLE token_balances ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1; followed by ALTER TABLE token_balances MATERIALIZE INDEX idx_block_number; and likewise for token_transfers (ADD INDEX IF NOT EXISTS ... and MATERIALIZE INDEX ...) to ensure the index is applied to pre-existing tables.

Partition token_balances

db48264

coderabbitai bot reviewed Sep 13, 2025

View reviewed changes

joaquim-verges approved these changes Sep 15, 2025

View reviewed changes

Index block number

144585d

coderabbitai bot reviewed Sep 15, 2025

View reviewed changes

Update schema

7e4530f

jakeloo merged commit f66f782 into main Sep 17, 2025
4 of 5 checks passed

jakeloo deleted the jl/schema-2 branch September 17, 2025 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Partition token_balances #291

Partition token_balances #291

Uh oh!

jakeloo commented Sep 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 13, 2025 •

edited

Loading

Review failed

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 13, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 15, 2025

Uh oh!

coderabbitai bot Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Partition token_balances #291

Partition token_balances #291

Uh oh!

Conversation

jakeloo commented Sep 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jakeloo commented Sep 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 13, 2025 •

edited

Loading