Retries¶
Argo Workflows offers a range of options for retrying failed steps.
Configuring retryStrategy
in WorkflowSpec
¶
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: retry-container-
spec:
entrypoint: retry-container
templates:
- name: retry-container
retryStrategy:
limit: "10"
container:
image: python:alpine3.6
command: ["python", -c]
# fail with a 66% probability
args: ["import random; import sys; exit_code = random.choice([0, 1, 1]); sys.exit(exit_code)"]
Retry policies¶
Use retryPolicy
to choose which failures to retry:
Always
: Retry all failed stepsOnFailure
: Retry steps whose main container is marked as failed in Kubernetes (this is the default)OnError
: Retry steps that encounter Argo controller errors, or whose init or wait containers failOnTransientError
: Retry steps that encounter errors defined as transient, or errors matching theTRANSIENT_ERROR_PATTERN
environment variable. Available in version 3.0 and later.
For example:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: retry-on-error-
spec:
entrypoint: error-container
templates:
- name: error-container
retryStrategy:
limit: "2"
retryPolicy: "Always"
container:
image: python
command: ["python", "-c"]
# fail with a 80% probability
args: ["import random; import sys; exit_code = random.choice(range(0, 5)); sys.exit(exit_code)"]
Conditional retries¶
v3.2 and after
You can also use expression
to control retries. The expression
field
accepts an expr expression and has
access to the following variables:
lastRetry.exitCode
: The exit code of the last retry, or "-1" if not availablelastRetry.status
: The phase of the last retry: Error, FailedlastRetry.duration
: The duration of the last retry, in seconds
If expression
evaluates to false, the step will not be retried.
See example for usage.
Back-Off¶
You can configure the delay between retries with backoff
. See example for usage.