Skip to content

Conversation

jakeloo
Copy link
Member

@jakeloo jakeloo commented Sep 13, 2025

Summary by CodeRabbit

  • Chores
    • Optimized token balances partitioning to group by chain and calendar quarter for faster historical queries and large-range exports.
    • Added block-number indexing for token balances and token transfers to speed block-based queries and range scans.
    • Reorganized log query projections to improve filtering/sorting by address, topic, and block number, boosting query performance and scalability; no user-facing API or workflow changes.

Copy link

coderabbitai bot commented Sep 13, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Updated ClickHouse DDL: added idx_block_number minmax index to token_transfers and token_balances, changed token_balances partitioning to (chain_id, toStartOfQuarter(block_timestamp)), and reorganized projections in the logs table (added two new projections, removed one).

Changes

Cohort / File(s) Summary
Token tables
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql, internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql
Added INDEX idx_block_number block_number TYPE minmax GRANULARITY 1 to both tables; changed token_balances partitioning from PARTITION BY chain_id to PARTITION BY (chain_id, toStartOfQuarter(block_timestamp)). Other table definitions unchanged.
Logs table projections
internal/tools/clickhouse/0002_clickhouse_create_logs_table.sql
Added PROJECTION chain_address_block_number_full_projection (SELECT * ORDER BY chain_id, address, block_number) and PROJECTION chain_topic0_full_projection (SELECT * ORDER BY chain_id, topic_0, block_number, address); removed old chain_topic0_projection. chain_address_topic0_projection unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Partition token_balances #291 — modifies the same ClickHouse schema files, including the token_balances partition change and idx_block_number addition.

Suggested reviewers

  • joaquim-verges

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Partition token_balances" is concise and directly summarizes the primary change in the changeset, which updates the partitioning for the token_balances table. The provided raw summary shows the partition key was changed to (chain_id, toStartOfQuarter(block_timestamp)) and related index work, confirming the title targets the main modification. As a short, specific phrase it will be clear to teammates scanning the repository history.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 144585d and 7e4530f.

📒 Files selected for processing (1)
  • internal/tools/clickhouse/0002_clickhouse_create_logs_table.sql (1 hunks)

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (2)

65-65: Confirm timezone for toStartOfQuarter to avoid boundary skew.

If block_timestamp is not guaranteed UTC at write time, quarters can shift by server TZ. Consider normalizing explicitly:

-PARTITION BY (chain_id, toStartOfQuarter(block_timestamp))
+PARTITION BY (chain_id, toStartOfQuarter(toTimeZone(block_timestamp, 'UTC')))

Alternatively, document that block_timestamp is stored in UTC and server timezones are aligned.


65-65: Leverage new time-based partitions with TTL/drop guidance.

With quarterly partitions, retention and operational tasks become cheaper. Define a TTL or document a partition-drop SOP (e.g., keep 8 quarters per chain) to cap storage.

Example:

ALTER TABLE token_balances
  MODIFY TTL block_timestamp + INTERVAL 2 YEAR
  TO DISK 'cold' -- or DELETE;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f711248 and db48264.

📒 Files selected for processing (1)
  • internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.
📚 Learning: 2025-07-22T18:30:59.800Z
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.

Applied to files:

  • internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql

)
ENGINE = ReplacingMergeTree(insert_timestamp, is_deleted)
PARTITION BY chain_id
PARTITION BY (chain_id, toStartOfQuarter(block_timestamp))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Partition change won’t apply to existing deployments; add an explicit migration plan.

This file uses CREATE TABLE IF NOT EXISTS, so clusters that already have token_balances (partitioned by chain_id) will not be altered. You’ll ship a schema drift where fresh installs get (chain_id, quarter) while existing ones stay on chain_id-only.

Recommended migration (outline):

  • Create a v2 table with the new partition key.
  • Pause or dual-write ingestion.
  • Backfill via INSERT … SELECT.
  • Atomically swap tables; then drop the old one.
  • Do this ON CLUSTER if applicable.

Example SQL (adapt to your deploy tooling and cluster topology):

-- 1) Create new table
CREATE TABLE token_balances_v2
(
  -- same columns, indexes, projections as current token_balances
) ENGINE = ReplacingMergeTree(insert_timestamp, is_deleted)
PARTITION BY (chain_id, toStartOfQuarter(block_timestamp))
ORDER BY (chain_id, owner_address, token_address, token_id, block_number, transaction_index, log_index, direction)
SETTINGS index_granularity = 8192,
         lightweight_mutation_projection_mode = 'rebuild',
         deduplicate_merge_projection_mode = 'rebuild',
         allow_part_offset_column_in_projections = 1;

-- 2) Backfill
INSERT INTO token_balances_v2 SELECT * FROM token_balances;

-- 3) Swap (atomic)
RENAME TABLE token_balances TO token_balances_old, token_balances_v2 TO token_balances;

-- 4) Resume writes; validate; then drop old
DROP TABLE token_balances_old;

If this file is intended only for fresh installs, add a separate numbered migration that performs the above for upgrades to avoid silent divergence.

🤖 Prompt for AI Agents
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql around line
65: the CREATE TABLE IF NOT EXISTS change adds a new PARTITION BY (chain_id,
toStartOfQuarter(block_timestamp)) but will not alter existing deployments that
already have token_balances partitioned only by chain_id, causing schema drift;
add an explicit migration that creates a token_balances_v2 with the new
partition key (matching all columns, indexes, projections and engine settings),
pause or dual-write ingestion if needed, backfill data via INSERT … SELECT to
token_balances_v2, perform an atomic swap (RENAME TABLE token_balances TO
token_balances_old, token_balances_v2 TO token_balances) and then resume writes
and validate before dropping token_balances_old, and ensure the migration runs
ON CLUSTER where applicable or provide this as an upgrade-only numbered
migration separate from the fresh-install DDL.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (1)

66-66: Partition change won’t apply in-place; ship an explicit v2 + backfill migration.

Existing tables stay partitioned by chain_id only. You’ll get schema drift between fresh installs and upgrades unless you migrate.

Minimal plan (adapt/ON CLUSTER as needed):

-- 1) Create v2 with the new partitioning
CREATE TABLE token_balances_v2 AS token_balances
ENGINE = ReplacingMergeTree(insert_timestamp, is_deleted)
PARTITION BY (chain_id, toStartOfQuarter(block_timestamp))
ORDER BY (chain_id, owner_address, token_address, token_id, block_number, transaction_index, log_index, direction)
SETTINGS index_granularity = 8192,
         lightweight_mutation_projection_mode = 'rebuild',
         deduplicate_merge_projection_mode = 'rebuild',
         allow_part_offset_column_in_projections = 1;

-- 2) Backfill
INSERT INTO token_balances_v2 SELECT * FROM token_balances;

-- 3) Atomic swap
RENAME TABLE token_balances TO token_balances_old, token_balances_v2 TO token_balances;

-- 4) Validate then drop old
DROP TABLE token_balances_old;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between db48264 and 144585d.

📒 Files selected for processing (2)
  • internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql (1 hunks)
  • internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.
📚 Learning: 2025-07-22T18:30:59.800Z
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.

Applied to files:

  • internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql

`insert_timestamp` DateTime DEFAULT now(),
`is_deleted` UInt8 DEFAULT 0,

INDEX idx_block_number block_number TYPE minmax GRANULARITY 1,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add an upgrade migration to materialize the new index on existing clusters.

CREATE TABLE IF NOT EXISTS won’t retrofit this index; existing deployments will miss idx_block_number unless you run ALTER + MATERIALIZE.

Suggested numbered migration (use ON CLUSTER if applicable):

-- add index
ALTER TABLE token_transfers ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1;

-- build index for historical parts
ALTER TABLE token_transfers MATERIALIZE INDEX idx_block_number;
🤖 Prompt for AI Agents
internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql around line
20: the new idx_block_number index will not be retrofitted into existing
clusters by CREATE TABLE IF NOT EXISTS, so add an upgrade migration that runs an
ALTER TABLE to add the index if missing and then MATERIALIZE the index for
historical parts; the migration should use "ALTER TABLE ... ADD INDEX IF NOT
EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1" and then "ALTER
TABLE ... MATERIALIZE INDEX idx_block_number", and include ON CLUSTER
<cluster_name> if this repo deploys ClickHouse clusters.

`insert_timestamp` DateTime DEFAULT now(),
`is_deleted` UInt8 DEFAULT 0,

INDEX idx_block_number block_number TYPE minmax GRANULARITY 1,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Also add an upgrade migration for the new block_number index.

Same issue: CREATE IF NOT EXISTS won’t add idx_block_number to existing tables; add ALTER + MATERIALIZE.

ALTER TABLE token_balances ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1;
ALTER TABLE token_balances MATERIALIZE INDEX idx_block_number;

Optional: keep an eye on storage overhead; minmax with GRANULARITY 1 is usually cheap and helps when queries filter by block_number without token/owner predicates.


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "PWD: $(pwd)"
echo
echo "== Show file internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (if present) =="
if [ -f internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql ]; then
  sed -n '1,240p' internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql || true
else
  echo "MISSING: internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql"
fi
echo
echo "== Search for idx_block_number / token_balances / MATERIALIZE INDEX / ALTER TABLE occurrences =="
rg -n --hidden --no-ignore -S "idx_block_number|MATERIALIZE INDEX idx_block_number|MATERIALIZE INDEX|ALTER TABLE token_balances|ADD INDEX idx_block_number|ADD INDEX IF NOT EXISTS idx_block_number|token_balances" || true
echo
echo "== List files in internal/tools/clickhouse (if dir exists) =="
if [ -d internal/tools/clickhouse ]; then
  ls -1 internal/tools/clickhouse | sed -n '1,200p' || true
else
  echo "DIR NOT FOUND: internal/tools/clickhouse"
fi

Length of output: 5878


Add ALTER migration to materialize idx_block_number on existing tables

CREATE TABLE IF NOT EXISTS does not add new indexes to pre-existing tables — add an ALTER + MATERIALIZE migration for token_balances (and token_transfers if already deployed).

Locations:

  • internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (idx_block_number at line 23)
  • internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql (same index)

Suggested SQL:

ALTER TABLE token_balances ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1;
ALTER TABLE token_balances MATERIALIZE INDEX idx_block_number;

ALTER TABLE token_transfers ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax GRANULARITY 1;
ALTER TABLE token_transfers MATERIALIZE INDEX idx_block_number;
🤖 Prompt for AI Agents
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql around line
23: CREATE TABLE IF NOT EXISTS does not materialize new indexes on existing
tables, so add an ALTER migration that adds and materializes idx_block_number
for token_balances (and do the same for token_transfers if deployed). Create a
new migration SQL (or append to existing migration series) that runs ALTER TABLE
token_balances ADD INDEX IF NOT EXISTS idx_block_number block_number TYPE minmax
GRANULARITY 1; followed by ALTER TABLE token_balances MATERIALIZE INDEX
idx_block_number; and likewise for token_transfers (ADD INDEX IF NOT EXISTS ...
and MATERIALIZE INDEX ...) to ensure the index is applied to pre-existing
tables.

@jakeloo jakeloo merged commit f66f782 into main Sep 17, 2025
4 of 5 checks passed
@jakeloo jakeloo deleted the jl/schema-2 branch September 17, 2025 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants