Artifacts¶
Note: You will need to configure an artifact repository to run this example. Configuring an artifact repository here.
When running workflows, it is very common to have steps that generate or consume artifacts. Often, the output artifacts of one step may be used as input artifacts to a subsequent step.
The below workflow spec consists of two steps that run in sequence. The first step named generate-artifact
will generate an artifact using the whalesay
template that will be consumed by the second step named print-message
that then consumes the generated artifact.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-passing-
spec:
entrypoint: artifact-example
templates:
- name: artifact-example
steps:
- - name: generate-artifact
template: whalesay
- - name: consume-artifact
template: print-message
arguments:
artifacts:
# bind message to the hello-art artifact
# generated by the generate-artifact step
- name: message
from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"
- name: whalesay
container:
image: docker/whalesay:latest
command: [sh, -c]
args: ["cowsay hello world | tee /tmp/hello_world.txt"]
outputs:
artifacts:
# generate hello-art artifact from /tmp/hello_world.txt
# artifacts can be directories as well as files
- name: hello-art
path: /tmp/hello_world.txt
- name: print-message
inputs:
artifacts:
# unpack the message input artifact
# and put it at /tmp/message
- name: message
path: /tmp/message
container:
image: alpine:latest
command: [sh, -c]
args: ["cat /tmp/message"]
The whalesay
template uses the cowsay
command to generate a file named /tmp/hello-world.txt
. It then outputs
this file as an artifact named hello-art
. In general, the artifact's path
may be a directory rather than just a file. The print-message
template takes an input artifact named message
, unpacks it at the path
named /tmp/message
and then prints the contents of /tmp/message
using the cat
command.
The artifact-example
template passes the hello-art
artifact generated as an output of the generate-artifact
step as the message
input artifact to the print-message
step. DAG templates use the tasks prefix to refer to another task, for example {{tasks.generate-artifact.outputs.artifacts.hello-art}}
.
Artifacts are packaged as Tarballs and gzipped by default. You may customize this behavior by specifying an archive strategy, using the archive
field. For example:
<... snipped ...>
outputs:
artifacts:
# default behavior - tar+gzip default compression.
- name: hello-art-1
path: /tmp/hello_world.txt
# disable archiving entirely - upload the file / directory as is.
# this is useful when the container layout matches the desired target repository layout.
- name: hello-art-2
path: /tmp/hello_world.txt
archive:
none: {}
# customize the compression behavior (disabling it here).
# this is useful for files with varying compression benefits,
# e.g. disabling compression for a cached build workspace and large binaries,
# or increasing compression for "perfect" textual data - like a json/xml export of a large database.
- name: hello-art-3
path: /tmp/hello_world.txt
archive:
tar:
# no compression (also accepts the standard gzip 1 to 9 values)
compressionLevel: 0
<... snipped ...>
Artifact Garbage Collection¶
As of version 3.4 you can configure your Workflow to automatically delete Artifacts that you don't need (presuming you're using S3 - other storage engines still need to be implemented).
Artifacts can be deleted OnWorkflowCompletion
or OnWorkflowDeletion
. You can specify your Garbage Collection strategy on both the Workflow level and the Artifact level, so for example, you may have temporary artifacts that can be deleted right away but a final output that should be persisted:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-gc-
spec:
entrypoint: main
artifactGC:
strategy: OnWorkflowDeletion # default Strategy set here applies to all Artifacts by default
templates:
- name: main
container:
image: argoproj/argosay:v2
command:
- sh
- -c
args:
- |
echo "can throw this away" > /tmp/temporary-artifact.txt
echo "keep this" > /tmp/keep-this.txt
outputs:
artifacts:
- name: temporary-artifact
path: /tmp/temporary-artifact.txt
s3:
key: temporary-artifact.txt
- name: keep-this
path: /tmp/keep-this.txt
s3:
key: keep-this.txt
artifactGC:
strategy: Never # optional override for an Artifact
Artifact Naming¶
Consider parameterizing your S3 keys by {{workflow.uid}}, etc (as shown in the example above) if there's a possibility that you could have concurrent Workflows of the same spec. This would be to avoid a scenario in which the artifact from one Workflow is being deleted while the same S3 key is being generated for a different Workflow.
Service Accounts and Annotations¶
Does your S3 bucket require you to run with a special Service Account or IAM Role Annotation? You can either use the same ones you use for creating artifacts or generate new ones that are specific for deletion permission. Generally users will probably just have a single Service Account or IAM Role to apply to all artifacts for the Workflow, but you can also customize on the artifact level if you need that:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-gc-
spec:
entrypoint: main
artifactGC:
strategy: OnWorkflowDeletion
##############################################################################################
# Workflow Level Service Account and Metadata
##############################################################################################
serviceAccountName: my-sa
podMetadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/my-iam-role
templates:
- name: main
container:
image: argoproj/argosay:v2
command:
- sh
- -c
args:
- |
echo "can throw this away" > /tmp/temporary-artifact.txt
echo "keep this" > /tmp/keep-this.txt
outputs:
artifacts:
- name: temporary-artifact
path: /tmp/temporary-artifact.txt
s3:
key: temporary-artifact-{{workflow.uid}}.txt
artifactGC:
####################################################################################
# Optional override capability
####################################################################################
serviceAccountName: artifact-specific-sa
podMetadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/artifact-specific-iam-role
- name: keep-this
path: /tmp/keep-this.txt
s3:
key: keep-this-{{workflow.uid}}.txt
artifactGC:
strategy: Never
If you do supply your own Service Account you will need to create a RoleBinding that binds it with the new artifactgc
Role.
What happens if Garbage Collection fails?¶
If deletion of the artifact fails for some reason (other than the Artifact already have been deleted which is not considered a failure), the Workflow's Status will be marked with a new Condition to indicate "Artifact GC Failure", a Kubernetes Event will be issued, and the Argo Server UI will also indicate the failure. In that case, if the user needs to delete the Workflow and its child CRD objects, the user will need to patch the Workflow to remove the finalizer preventing the deletion:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
finalizers:
- workflows.argoproj.io/artifact-gc
The finalizer can be deleted by doing:
kubectl patch workflow my-wf \
--type json \
--patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'