-
Notifications
You must be signed in to change notification settings - Fork 739
Open
Labels
Description
With citus 11.3 I've added a node and triggered a rebalance. The rebalance has been scheduled correctly but never starts running despite having 1 runnable task (and 10 blocked ones).
I'm using the docker image citusdata/citus:11.3 in all nodes. The connection between the nodes works (primary is at 10.132.0.2):
SELECT * FROM citus_get_active_worker_nodes();
node_name | node_port
------------+-----------
10.132.0.4 | 5432
10.132.0.5 | 5432
(2 rows)
Command history:
staging=# SELECT * from citus_add_node('10.132.0.5', 5432);
citus_add_node
----------------
10
(1 row)
Time: 623.522 ms
staging=# SELECT citus_rebalance_start();
NOTICE: Scheduled 10 moves as job 1
DETAIL: Rebalance scheduled as background job
HINT: To monitor progress, run: SELECT * FROM citus_rebalance_status();
citus_rebalance_start
-----------------------
1
(1 row)
Time: 26.101 ms
staging=# SELECT * FROM citus_rebalance_status();
job_id | state | job_type | description | started_at | finished_at | details
--------+-----------+-----------+---------------------------------+------------+-------------+--------------------------------------------------------------------
1 | scheduled | rebalance | Rebalance all colocation groups | | | {"tasks": [], "task_state_counts": {"blocked": 10, "runnable": 1}}
(1 row)
Time: 3.200 ms
staging=# SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'staging'
AND pid <> pg_backend_pid();
pg_terminate_backend
----------------------
t
t
t
t
t
t
t
t
(8 rows)
staging=# SELECT get_rebalance_table_shards_plan();
get_rebalance_table_shards_plan
-------------------------------------------------------------
(sensor_datapoint,102183,0,10.132.0.4,5432,10.132.0.5,5432)
(sensor_datapoint,102182,0,10.132.0.2,5432,10.132.0.5,5432)
(sensor_datapoint,102185,0,10.132.0.4,5432,10.132.0.5,5432)
(sensor_datapoint,102184,0,10.132.0.2,5432,10.132.0.5,5432)
(sensor_datapoint,102187,0,10.132.0.4,5432,10.132.0.5,5432)
(sensor_datapoint,102186,0,10.132.0.2,5432,10.132.0.5,5432)
(sensor_datapoint,102189,0,10.132.0.4,5432,10.132.0.5,5432)
(sensor_datapoint,102188,0,10.132.0.2,5432,10.132.0.5,5432)
(sensor_datapoint,102191,0,10.132.0.4,5432,10.132.0.5,5432)
(sensor_datapoint,102190,0,10.132.0.2,5432,10.132.0.5,5432)
(10 rows)
Time: 4.475 ms
staging=# SELECT * from pg_dist_node;
nodeid | groupid | nodename | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards
--------+---------+------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------
1 | 0 | 10.132.0.2 | 5432 | default | t | t | primary | default | t | t
6 | 5 | 10.132.0.4 | 5432 | default | t | t | primary | default | t | t
10 | 9 | 10.132.0.5 | 5432 | default | t | t | primary | default | t | t
staging=# ALTER SYSTEM SET citus.max_background_task_executors_per_node = 2;
ALTER SYSTEM
Time: 9.613 ms
staging=# SELECT pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
Time: 1.585 ms
staging=# SELECT * FROM citus_rebalance_status() \gx
-[ RECORD 1 ]-------------------------------------------------------------------
job_id | 1
state | scheduled
job_type | rebalance
description | Rebalance all colocation groups
started_at |
finished_at |
details | {"tasks": [], "task_state_counts": {"blocked": 10, "runnable": 1}}
Time: 3.033 ms
I've been waiting for a long time and nothing changes.