Container Set Template¶
v3.1 and after
A container set templates is similar to a normal container or script template, but allows you to specify multiple containers to run within a single pod.
Because you have multiple containers within a pod, they will be scheduled on the same host. You can use cheap and fast empty-dir volumes instead of persistent volume claims to share data between steps.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: container-set-template-
spec:
entrypoint: main
templates:
- name: main
volumes:
- name: workspace
emptyDir: { }
containerSet:
volumeMounts:
- mountPath: /workspace
name: workspace
containers:
- name: a
image: argoproj/argosay:v2
- name: b
image: argoproj/argosay:v2
- name: main
image: argoproj/argosay:v2
dependencies:
- a
- b
outputs:
parameters:
- name: message
valueFrom:
path: /workspace/message
There are a couple of caveats:
- You must use the Emissary Executor.
- Or all containers must run in parallel - i.e. it is a graph with no dependencies.
- You cannot use enhanced depends logic.
- It will use the sum total of all resource requests, maybe costing more than the same DAG template. This will be a problem if your requests already cost a lot. See below.
The containers can be arranged as a graph by specifying dependencies. This is suitable for running 10s rather than 100s of containers.
Inputs and Outputs¶
As with the container and script templates, inputs and outputs can only be loaded and saved from a container
named main
.
All container set templates that have artifacts must/should have a container named main
.
If you want to use base-layer artifacts, main
must be last to finish, so it must be the root node in the graph.
That is may not be practical.
Instead, have a workspace volume and make sure all artifacts paths are on that volume.
⚠️ Resource Requests¶
A container set actually starts all containers, and the Emissary only starts the main container process when the containers it depends on have completed. This mean that even though the container is doing no useful work, it is still consuming resources and you're still getting billed for them.
If your requests are small, this won't be a problem.
If your requests are large, set the resource requests so the sum total is the most you'll need at once.
Example A: a simple sequence e.g. a -> b -> c
a
needs 1Gi memoryb
needs 2Gi memoryc
needs 1Gi memory
Then you know you need only a maximum of 2Gi. You could set as follows:
a
requests 512Mi memoryb
requests 1Gi memoryc
requests 512Mi memory
The total is 2Gi, which is enough for b
. We're all good.
Example B: Diamond DAG e.g. a diamond a -> b -> d and a -> c -> d
, i.e. b
and c
run at the same time.
a
needs 1000 cpub
needs 2000 cpuc
needs 1000 cpud
needs 1000 cpu
I know that b
and c
will run at the same time. So I need to make sure the total is 3000.
a
requests 500 cpub
requests 1000 cpuc
requests 1000 cpud
requests 500 cpu
The total is 3000, which is enough for b + c
. We're all good.
Example B: Lopsided requests, e.g. a -> b
where a
is cheap and b
is expensive
a
needs 100 cpu, 1Mi memory, runs for 10hb
needs 8Ki GPU, 100 Gi memory, 200 Ki GPU, runs for 5m
Can you see the problem here? a
only has small requests, but the container set will use the total of all requests. So it's as if you're using all that GPU for 10h. This will be expensive.
Solution: do not use container set when you have lopsided requests.