-
-
Notifications
You must be signed in to change notification settings - Fork 237
jobs#index perf Issue: ORDER BY uses wrong column for existing index #1675
Description
Summary
The GoodJob admin interface (jobs#index) performs ORDER BY created_at DESC, id DESC but the database has an index on (active_job_id, created_at). This causes full table scans on large datasets instead of utilizing the existing compound index.
Problem Description
When the good_jobs table grows to 1M+ records, the admin interface can become unusable due to long query times and statement timeouts (if set).
Root Cause: Index mismatch between query pattern and available indexes
- Query uses:
ORDER BY created_at DESC, id DESC - Index available:
index_good_jobs_on_active_job_id_and_created_at (active_job_id, created_at) - Result: PostgreSQL performs expensive full table scan + sort instead of index scan
Current Behavior
-- Current query from display_all scope
SELECT * FROM good_jobs
ORDER BY created_at DESC, id DESC
LIMIT 25;
-- Execution plan shows:
-- Seq Scan on good_jobsExpected Behavior
Since id == active_job_id in GoodJob (they're set to the same UUID value), the query should utilize the existing index:
-- Optimized query that uses existing index
SELECT * FROM good_jobs
ORDER BY created_at DESC, active_job_id DESC
LIMIT 25;
-- Would use: index_good_jobs_on_active_job_id_and_created_atEnvironment
- GoodJob Version: 4.11.2
- Rails Version: 8.0.2.1
- Database: PostgreSQL 16
- Dataset Size: 1M+ jobs
Sanity Check / Reproduction Steps
GoodJob::Job.display_all(ordered_by: %w[created_at desc]).to_sqlCurrent Workaround
I've implemented a monkey patch that overrides the display_all method to use active_job_id instead of id in ORDER BY clauses:
# Temporary, overbuilt by Claude, workaround
module GoodJob::FilterableOptimizations
extend ActiveSupport::Concern
class_methods do
def display_all(ordered_by: %w[created_at desc], after_at: nil, after_id: nil)
if self == GoodJob::Job && ordered_by == %w[created_at desc]
# Use active_job_id to match existing index
query = all.order(created_at: :desc, active_job_id: :desc)
# Handle pagination with active_job_id consistently
if after_at.present? && after_id.present?
query = query.where Arel::Nodes::Grouping.new([
arel_table[:created_at], arel_table[:active_job_id]
]).lt(Arel::Nodes::Grouping.new([
bind_value(:created_at, after_at, ActiveRecord::Type::DateTime),
bind_value(:active_job_id, after_id, ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Uuid)
]))
end
query
else
super
end
end
end
end
GoodJob::Job.prepend(GoodJob::FilterableOptimizations)Results: 20+ second queries (in my environment) reduced to ~20ms.
Proposed Solution
We could add yet another index of id and created_at but I don't think the intent is to use id primarily on the Job model.
Instead we can modify the display_all scope in app/models/concerns/good_job/filterable.rb to use active_job_id instead of id for the secondary sort column when active_job_id column exists to use the existing index.
Alternative
Add primary_key definition to GoodJob::Job that points to active_job_id. This could work but may have risky implications for advisory locks and associations.
Additional Context
The (active_job_id, created_at) index appears to be designed for this exact use case, but the Filterable concern's display_all method doesn't utilize it. This suggests the intent was to optimize these queries, but the implementation doesn't align with the indexing strategy.
References
- GoodJob Source:
app/models/concerns/good_job/filterable.rb
Metadata
Metadata
Assignees
Labels
Projects
Status