Concourse as an Orchestrator of Orchestrator (TFC) with high concurrent job count #8963
-
|
Context: I'm working with a client has a small number of job types (< 15), large number of instances ( > 1000 ), with a dependency tree that pushes inputs/outputs through those workspace (e.g. the templates might have a graph like the attached photo). Issue: Question: My logic was that since concourse tasks are just containers without limits (i think?), you could much more effectively bin pack those jobs down so that the resource overheads are significantly less than your typical SaaS orchestrator, but also maintaining a first-class execution graph ui? Is there a better way to do this? Similarly, is there any guidance as to the maximum number of jobs that should be presented in a graph. Would we end up killing the system if we had a graph with a thousand nodes in it? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
Hello,
I don't have direct experience on this. What I can say is that a task (a Concourse job contains one or more tasks) doesn't have the notion of waiting for I/O. It looks to me that you would end up with N tasks, each busy waiting, polling for this "data to come back in". This might work, but consider that each task is mapped to a container, and in my extensive experience, above ca 220 containers, a Concourse worker becomes unstable (at least with my workload). Another way to "wait for data to come in" is to write a Concourse resource. This might work, but again a Concourse resource is a container, which by default polls by being restarted on each poll interval, so it is even more expensive than busy waiting within a task. From my understanding of what you want to achieve, I would consider another approach, something like https://temporal.io/ (although I like the idea, I do not have first-hand experience).
To me, before the killing or not, the real question is whether a human could navigate the graph or not. For this, it is possible to use job groups, see https://concourse-ci.org/pipelines.html#schema.group_config. An example of job groups in a pipeline is the Concourse CI itself: https://ci.concourse-ci.org/teams/main/pipelines/concourse |
Beta Was this translation helpful? Give feedback.

Hello,
I don't have direct experience on this. What I can say is that a task (a Concourse job contains one or more tasks) doesn't have the notion of waiting for I/O. It looks to me that you would end up with N tasks, each busy waiting, polling for this "data to come back in". This might work, but consider that each task is mapped to a container, and in my extensive experience, above ca 220 containers, a Concourse worker becomes unstable (at least with my workload).
Another way to "wait for data to com…