Skip to content

Artifacts

Note: You will need to configure an artifact repository to run this example. Configuring an artifact repository here.

When running workflows, it is very common to have steps that generate or consume artifacts. Often, the output artifacts of one step may be used as input artifacts to a subsequent step.

The below workflow spec consists of two steps that run in sequence. The first step named generate-artifact will generate an artifact using the whalesay template that will be consumed by the second step named print-message that then consumes the generated artifact.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-passing-
spec:
  entrypoint: artifact-example
  templates:
  - name: artifact-example
    steps:
    - - name: generate-artifact
        template: whalesay
    - - name: consume-artifact
        template: print-message
        arguments:
          artifacts:
          # bind message to the hello-art artifact
          # generated by the generate-artifact step
          - name: message
            from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee /tmp/hello_world.txt"]
    outputs:
      artifacts:
      # generate hello-art artifact from /tmp/hello_world.txt
      # artifacts can be directories as well as files
      - name: hello-art
        path: /tmp/hello_world.txt

  - name: print-message
    inputs:
      artifacts:
      # unpack the message input artifact
      # and put it at /tmp/message
      - name: message
        path: /tmp/message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["cat /tmp/message"]

The whalesay template uses the cowsay command to generate a file named /tmp/hello-world.txt. It then outputs this file as an artifact named hello-art. In general, the artifact's path may be a directory rather than just a file. The print-message template takes an input artifact named message, unpacks it at the path named /tmp/message and then prints the contents of /tmp/message using the cat command. The artifact-example template passes the hello-art artifact generated as an output of the generate-artifact step as the message input artifact to the print-message step. DAG templates use the tasks prefix to refer to another task, for example {{tasks.generate-artifact.outputs.artifacts.hello-art}}.

Artifacts are packaged as Tarballs and gzipped by default. You may customize this behavior by specifying an archive strategy, using the archive field. For example:

<... snipped ...>
    outputs:
      artifacts:
        # default behavior - tar+gzip default compression.
      - name: hello-art-1
        path: /tmp/hello_world.txt

        # disable archiving entirely - upload the file / directory as is.
        # this is useful when the container layout matches the desired target repository layout.   
      - name: hello-art-2
        path: /tmp/hello_world.txt
        archive:
          none: {}

        # customize the compression behavior (disabling it here).
        # this is useful for files with varying compression benefits, 
        # e.g. disabling compression for a cached build workspace and large binaries, 
        # or increasing compression for "perfect" textual data - like a json/xml export of a large database.
      - name: hello-art-3
        path: /tmp/hello_world.txt
        archive:
          tar:
            # no compression (also accepts the standard gzip 1 to 9 values)
            compressionLevel: 0
<... snipped ...>

Artifact Garbage Collection

As of version 3.4 you can configure your Workflow to automatically delete Artifacts that you don't need (presuming you're using S3 - other storage engines still need to be implemented).

Artifacts can be deleted OnWorkflowCompletion or OnWorkflowDeletion. You can specify your Garbage Collection strategy on both the Workflow level and the Artifact level, so for example, you may have temporary artifacts that can be deleted right away but a final output that should be persisted:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-gc-
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion  # default Strategy set here applies to all Artifacts by default
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "can throw this away" > /tmp/temporary-artifact.txt
            echo "keep this" > /tmp/keep-this.txt
      outputs:
        artifacts:
          - name: temporary-artifact
            path: /tmp/temporary-artifact.txt
            s3:
              key: temporary-artifact.txt
          - name: keep-this
            path: /tmp/keep-this.txt
            s3:
              key: keep-this.txt
            artifactGC:
              strategy: Never   # optional override for an Artifact

Artifact Naming

Consider parameterizing your S3 keys by {{workflow.uid}}, etc (as shown in the example above) if there's a possibility that you could have concurrent Workflows of the same spec. This would be to avoid a scenario in which the artifact from one Workflow is being deleted while the same S3 key is being generated for a different Workflow.

Service Accounts and Annotations

Does your S3 bucket require you to run with a special Service Account or IAM Role Annotation? You can either use the same ones you use for creating artifacts or generate new ones that are specific for deletion permission. Generally users will probably just have a single Service Account or IAM Role to apply to all artifacts for the Workflow, but you can also customize on the artifact level if you need that:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-gc-
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion 
    ##############################################################################################
    #    Workflow Level Service Account and Metadata
    ##############################################################################################
    serviceAccountName: my-sa
    podMetadata:
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/my-iam-role
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "can throw this away" > /tmp/temporary-artifact.txt
            echo "keep this" > /tmp/keep-this.txt
      outputs:
        artifacts:
          - name: temporary-artifact
            path: /tmp/temporary-artifact.txt
            s3:
              key: temporary-artifact-{{workflow.uid}}.txt
            artifactGC:
              ####################################################################################
              #    Optional override capability
              ####################################################################################
              serviceAccountName: artifact-specific-sa
              podMetadata:
                annotations:
                  eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/artifact-specific-iam-role
          - name: keep-this
            path: /tmp/keep-this.txt
            s3:
              key: keep-this-{{workflow.uid}}.txt
            artifactGC:
              strategy: Never

If you do supply your own Service Account you will need to create a RoleBinding that binds it with the new artifactgc Role.

What happens if Garbage Collection fails?

If deletion of the artifact fails for some reason (other than the Artifact already have been deleted which is not considered a failure), the Workflow's Status will be marked with a new Condition to indicate "Artifact GC Failure", a Kubernetes Event will be issued, and the Argo Server UI will also indicate the failure. In that case, if the user needs to delete the Workflow and its child CRD objects, the user will need to patch the Workflow to remove the finalizer preventing the deletion:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
  finalizers:
  - workflows.argoproj.io/artifact-gc

The finalizer can be deleted by doing:

kubectl patch workflow my-wf \
    --type json \
    --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'

Comments

Back to top