Skip to content

enh: optimize performance while scalar subquery as primary key condtion#34027

Merged
guanshengliang merged 3 commits into3.0from
enh/6617004723
Dec 25, 2025
Merged

enh: optimize performance while scalar subquery as primary key condtion#34027
guanshengliang merged 3 commits into3.0from
enh/6617004723

Conversation

@dapan1121
Copy link
Contributor

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @dapan1121, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on optimizing the performance of database queries that incorporate scalar subqueries within primary key conditions, such as those applied to timestamps (ts) or row identifiers (_c0). The changes involve a series of enhancements to the query planner and executor, allowing the system to intelligently identify, store, and process these specific types of conditions, particularly when they interact with remote data or complex scalar evaluations. The overall goal is to streamline the execution of such queries, leading to improved efficiency and faster data retrieval.

Highlights

  • Primary Key Condition Storage: Introduced a new pPrimaryCond field in SScanLogicNode and STableScanPhysiNode to specifically store primary key conditions that are derived from scalar subqueries, enabling more targeted optimization.
  • Remote Node Detection in Filters: Enhanced the filterGetTimeRange function to detect and report if a remote node is involved in the time range condition, which is crucial for handling distributed query optimizations.
  • Specialized Scalar Subquery Evaluation: Implemented scalarCalculateRemoteConstants and related logic within the scalar calculation framework to efficiently evaluate scalar subqueries that involve remote values, improving performance for such scenarios.
  • Integrated Query Execution Optimization: Integrated the new primary key condition handling into the query execution pipeline via initQueryTableDataCond, allowing the system to process scalar subqueries in ts or _c0 conditions more effectively by calculating their time ranges and merging them into the main filter conditions when appropriate.
  • Expanded Test Coverage: Added new test cases to validate the correctness and performance of scalar subqueries used with ts and _c0 conditions, including those involving aggregate functions like first() and last().

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance optimization for scalar subqueries used as primary key conditions. The overall approach is sound, involving separating these conditions during planning and evaluating them at execution time. The changes are well-integrated throughout the codebase. However, I've identified a few issues that should be addressed: a typo in a function name, a high-severity error-masking bug in the new logic, and a potential high-severity race condition involving a global variable. My review includes detailed explanations and suggestions for fixing these issues.

Comment on lines +2427 to +2431
int32_t scalarCalculateRemoteConstants(SNode *pNode, SNode **pRes) {
gTaskScalarExtra.pStreamInfo = NULL;
gTaskScalarExtra.pStreamRange = NULL;
return sclCalcConstants(pNode, false, true, pRes, &gTaskScalarExtra);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function modifies the global variable gTaskScalarExtra. If scalarCalculateRemoteConstants can be executed by multiple threads concurrently, this will cause a data race, leading to unpredictable behavior. If gTaskScalarExtra is not already thread-local, it should be made thread-local (e.g., using __thread or _Thread_local) or the necessary state should be passed as a parameter to ensure thread safety.

}

int32_t initQueryTableDataCond(SQueryTableDataCond* pCond, const STableScanPhysiNode* pTableScanNode,
static int32_t getPrimayTimeRange(SNode** pPrimaryKeyCond, STimeWindow* pTimeRange, bool* isStrict) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the function name. getPrimayTimeRange should be getPrimaryTimeRange. This should be corrected in both the function definition and its call site for better code maintainability.

static int32_t getPrimaryTimeRange(SNode** pPrimaryKeyCond, STimeWindow* pTimeRange, bool* isStrict) {

@dapan1121 dapan1121 requested a review from feici02 as a code owner December 25, 2025 00:59
@guanshengliang guanshengliang merged commit aa376a8 into 3.0 Dec 25, 2025
13 of 18 checks passed
@guanshengliang guanshengliang deleted the enh/6617004723 branch December 25, 2025 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants