YAML Reference¶
Pipeline configuration files use YAML syntax, in a .yml
or .yaml
file extension. We recommend naming the as workflow.*.yaml
to make it easier to identify them for VSCode Language Server support arriving soon.
The configuration file is used to define the pipeline structure, the steps to be executed, and the deployments where the steps will run.
version
¶
Version string to be used by the parser. E.g. "2".
name
¶
Name for identifying the configuration.
defaults
¶
Use defaults
to create a map
of default settings that will apply to all Work
objects in the configuration.
version: "2"
name: "defaults_example"
defaults: # (1)
user: "chimefrb"
event: ["123456"]
- All Work objects created on this configuration will have
user="chimefrb"
andevent=["123456"]
as default values.
schedule
¶
A mapping that specifies the schedule for the job. If this is not defined, the configuration will be read only once.
schedule.cronspec
¶
A string that specifies the cron expression for when the job should run.
Example: Creating a pipeline with cronspec¶
version: "2"
name: "cronspec_example"
schedule:
cronspec: "30 4 * * 0" # (1)
pipeline: ...
- This pipeline configuration will run at 04:30 every Sunday.
schedule.lives
¶
An integer that specifies how many times the job should be triggered by the cronspec. Default is 1. For triggering the job without lives count use -1.
Note: The schedule
section requires cronspec
and lives
, if schedule
is empty, it could lead to errors.
Example: Creating a config with cronspec and lives¶
version: "2"
name: "cronspec_count_example"
schedule:
cronspec: "30 4 * * 0"
lives: 2 # (1)
pipeline: ...
- Same as the previous example, this will execute at 04:30 every Sunday but only 2 times. Once this limit has been reached, the
Schedule
status will beexpired
.
deployments
¶
A list of mappings that define specifications for docker containers in which the configuration (or a set of specific steps) will run. When no deployments are specified, the Work
objects generated by the pipeline will wait in Buckets until a worker grabs them and executes them.
deployments.<deployment>.name
¶
A string used to identify the deployment.
deployments.<deployment>.site
¶
A string that defines were the container will be deployed. The accepted sites for a pipelines installation can vary since they are defined on the pipeline workspace. #TODO add link to workspace documentation
deployments.<deployment>.image
¶
Name and tag of the docker image that will be used to build the container.
deployments.<deployment>.sleep
¶
deployments.<deployment>.resources
¶
A mapping that defines the resources that the container will use.
deployments.<deployment>.resources.cores
¶
An integer number for specifying the amount of CPU cores for the container.
deployments.<deployment>.resources.ram
¶
A string for specifying the amount of RAM for the container. Example: "2G"=2 gigabytes, "512M"=512 megabytes.
deployments.<deployment>.resources.gpu
¶
An integer number for specifying the amount of GPU cores for the container.
deployments.<deployment>.replicas
¶
An integer for specifying the replicas for a container.
Example: Creating a config with deployments¶
version: "2"
name: example_deployments
deployments:
- name: chimefrb_small_dep
site: chime
image: chimefrb/workflow:latest
sleep: 1
resources:
cores: 2
ram: 1G
replicas: 2
- name: chimefrb_medium_dep
site: chime
image: chimefrb/workflow:latest
sleep: 1
resources:
cores: 4
ram: 2G
replicas: 1
pipeline:
...
pipeline
¶
The main focus on a workflow pipeline configuration is the pipeline
which contains all the steps to be executed, these steps can be separated on stages, all the steps inside a stage will execute concurrently.
pipeline.runs_on
¶
pipeline
accepts the runs_on
field when there are deployments defined on the configuration. This defines the default deployments on which the Work
objects will be executed:
version: 2
name: runs-on-test
deployments:
- name: small_deployment_1
site: local
image: chimefrb/workflow:latest
sleep: 1
resources:
cores: 2
ram: 1G
replicas: 2
pipeline:
runs_on: small_deployment_1 # (1)
matrix:
name: ["user1", "user2"]
steps:
- name: hello-world
stage: 1
work:
command: ["echo", "hello", "${{ matrix.name }}"]
user: test
site: local
...
small_deployment_1
.
pipeline.matrix
¶
The pipeline.matrix
is a tool that allow us to generate several Pipeline
objects from one Config
. It is a mapping were we define fields that we want to replace on our pipeline body. Example:
version: 2
name: matrix-test
pipeline:
matrix:
name: ["user1", "user2"]
steps:
- name: hello-world
stage: 1
work:
command: ["echo", "hello", "${{ matrix.name }}"]
user: test
site: local
This Config
will generate 2 Pipeline
objects, one for each value defined on matrix.name
pipeline.matrix.<field>.range
¶
Any value inside a matrix can accept a range
value that will let us define a python-like range:
version: 2
name: matrix-test
pipeline:
matrix:
id:
range: [1, 20, 2] # (1)
steps:
- name: check-ids
stage: 1
work:
command: ["echo", "checking id", "${{ matrix.id }}"]
user: test
site: local
pipeline.steps
¶
steps
is a required field for pipeline
it consists of a list of objects in the following format:
Example: Creating steps on a pipeline¶
version: 2
name: "pipeline_test"
pipeline:
steps:
- name: task_1 # (1)
stage: 1
work:
...
- name: task_2
stage: 1
work:
...
- name: task_2_1
stage: 2
...
- Here
task_1
is the name of the step and its value is the map containing the keysstage
andwork
.
pipeline.steps.<step>.name
¶
Name of the current step. This will be passed to the Work.name
field.
pipeline.steps.<step>.stage
¶
This represents the order of execution on the pipeline configuration. e.g. In the example Creating steps on a pipeline
there are 2 steps on the stage 1 and one step on the stages 2; a Work
object will be created for all of the steps on stage 1 and the stage2 will only be executed if stage1 has completed succesfully.
pipeline.steps.<step>.work
¶
All the keys under <step>.work
will represent the parameters for a Work
object. For example:
- name: baseband-localization # (1)
stage: 2
work: # (2)
parameters: # Work.parameters
path: # Work.path
event: # Work.event
tags: # Work.tags
function: # Work.function
...
- Step name.
- Fill this as if you were creating a
Work
object on console.
pipeline.steps.<step>.matrix
¶
Just like in the top level matrix
you can define a set of values that will be used for the step to generate Work
objects depending on all the combinations of these values.
Example: Creating a matrix for a step¶
- name: stage_1_a
stage: 1
matrix: # (1)
event: [123456, 645123]
site: ["aro", "canfar"]
work: # (2)
site: ${{ matrix.site }}
command: ["ls", "${{ matrix.event }}"]
- This
matrix
will generate 4Work
objects, one per each combination:[123456, "aro"], [123456, "canfar"], [645123, "aro"], [645123, "canfar"]
- The values will be replaced where the
matrix.<key_name>
are specified on theWork
object.
Note: You cannot use the same key on a top level matrix
and on a inner matrix
. This will raise an error on your pipeline configuration.
pipeline.steps.<step>.if
¶
Use <step>.if
when you need to specify conditions for a step to execute. Workflow pipelines keeps a context for all configurations that can be referenced to access several values of the pipeline configuration.
Example: Running a step depending on results of previous steps¶
version: "2"
name: "conditionals_tests"
pipeline:
- name: stage_1_step_1
stage: 1
work:
...
- name: stage_1_step_2
stage: 1
work:
...
- name: stage_2_step_1 # (1)
stage: 2
if: ${{ pipeline.stage_1_step_1.status == 'success' }}
...
- This step will only be executed if
stage_1_step_1
executed successfully.
Example: Running a step using internal functions¶
version: "2"
name: "internals_tests"
pipeline:
- name: stage_1_step_1
stage: 1
work: ...
- name: stage_2_step_1
stage: 2
work: ...
- name: stage_success
if: success # (1)
work: ...
- name: stage_failure
if: failure # (2)
work: ...
- name: stage_always
if: always # (3)
work: ...
success
will be true only if the whole pipeline has the statussuccess
.failure
will be true only if the whole pipeline has the statusfailure
.always
will always beTrue
.
pipeline.steps.<step>.runs_on
¶
Just like the top level runs_on
field, this will dictate in which of the defined deployments the Work
object is going to be executed.
Example: Creating config with runs_on specifications¶
version: "2"
name: example_deployments
defaults:
user: test
site: local
deployments:
- name: ld1
site: local
image: workflow_local
sleep: 1
resources:
cores: 2
ram: 1G
replicas: 2
- name: ld2
site: local
sleep: 1
image: workflow_local
resources:
cores: 4
ram: 2G
replicas: 1
pipeline:
runs_on: ld1
steps:
- name: echo
stage: 1
work:
command: ["ls", "-lah"]
- name: uname
runs_on: ld2 # (1)
stage: 2
work:
command: ["uname", "-a"]
- name: printenv
stage: 3
work:
command: ["printenv"]
echo
and printenv
will execute on the pipeline runs_on
deployment ld1
, only the step uname
will execute on the deployment ld2
.
pipeline.steps.<step>.success_threshold
¶
This key lets us define some percentage of success on a step allowing us to continue to the next step if X percentage of the replicas were successful. For example if we have a replicated step like the following:
Example: Using success_threshold on replicated step.¶
- name: replicated-task
stage: 1
matrix:
event:
range: [1, 1000]
work:
command: ["ls", "${{ matrix.event }}"]
Let's suppose that from the 1000 replicas generated, we only need 60% of them to be successful to advance to the next stage, we can use success_threshold
for this:
- name: replicated-task
stage: 1
matrix:
event:
range: [1, 1000]
success_threshold: 0.6
work:
command: ["ls", "${{ matrix.event }}"]
Internal pipeline context¶
We can extend our configuration capabilities to use variables that will be only be available on runtime.
reference
¶
reference
is a mapping that is used to grab values from previous steps and use it on the current one. It works with placeholders just like matrix that means instead of using matrix.<some_value>
we would use reference.<some_value>
.
Technically in reference
we could store values like in matrix
to be used on the step's work definition but the best way to use it is along matrix
, for example:
In this configuration, the step use-function
will generate 2 work objects which values will be used on the use-results
step. As we need the results of several Work
objects, the placeholders ${{pipeline.use-function-results.sum}}
and ${{pipeline.use-function.results.product}}
will both be a list. This list can be used in matrix so we replicate the use-results
step based on that.
version: "2"
name: example_reference
defaults:
user: test
site: local
pipeline:
steps:
- name: use-function
stage: 1
matrix:
a: [1.2, 2.5]
b: [5]
event:
- [11122, 11121]
work:
function: workflow.examples.function.math
event: ${{matrix.event}}
parameters:
alpha: ${{matrix.a}}
beta: ${{matrix.b}}
- name: use-results
stage: 2
reference: # (1)
sum: ${{pipeline.use-function.work.results.sum}} # (2)
product: ${{pipeline.use-function.work.results.product}}
matrix:
sum: ${{reference.sum}}
product: ${{reference.product}}
work:
command: ["echo ${{matrix.sum}} && echo ${{matrix.product}}"] # (3)
- name: finish
if: success
stage: 3
work:
command: ["echo", "completed"]
- We define reference when we want to use values from previous steps.
- Take the
sum
result from stepuse-function
. - Use the reference with the appropiate syntax.