YAML Reference¶
Pipeline configuration files use YAML syntax, in a .yml or .yaml file extension. We recommend naming the as workflow.*.yaml to make it easier to identify them for VSCode Language Server support arriving soon.
The configuration file is used to define the pipeline structure, the steps to be executed, and the deployments where the steps will run.
version¶
Version string to be used by the parser. E.g. "2".
name¶
Name for identifying the configuration.
defaults¶
Use defaults to create a map of default settings that will apply to all Work objects in the configuration.
version: "2"
name: "defaults_example"
defaults: # (1)
user: "chimefrb"
event: ["123456"]
- All Work objects created on this configuration will have
user="chimefrb"andevent=["123456"]as default values.
schedule¶
A mapping that specifies the schedule for the job. If this is not defined, the configuration will be read only once.
schedule.cronspec¶
A string that specifies the cron expression for when the job should run.
Example: Creating a pipeline with cronspec¶
version: "2"
name: "cronspec_example"
schedule:
cronspec: "30 4 * * 0" # (1)
pipeline: ...
- This pipeline configuration will run at 04:30 every Sunday.
schedule.lives¶
An integer that specifies how many times the job should be triggered by the cronspec. Default is 1. For triggering the job without lives count use -1.
Note: The schedule section requires cronspec and lives, if schedule is empty, it could lead to errors.
Example: Creating a config with cronspec and lives¶
version: "2"
name: "cronspec_count_example"
schedule:
cronspec: "30 4 * * 0"
lives: 2 # (1)
pipeline: ...
- Same as the previous example, this will execute at 04:30 every Sunday but only 2 times. Once this limit has been reached, the
Schedulestatus will beexpired.
deployments¶
A list of mappings that define specifications for docker containers in which the configuration (or a set of specific steps) will run. When no deployments are specified, the Work objects generated by the pipeline will wait in Buckets until a worker grabs them and executes them.
deployments.<deployment>.name¶
A string used to identify the deployment.
deployments.<deployment>.site¶
A string that defines were the container will be deployed. The accepted sites for a pipelines installation can vary since they are defined on the pipeline workspace. #TODO add link to workspace documentation
deployments.<deployment>.image¶
Name and tag of the docker image that will be used to build the container.
deployments.<deployment>.sleep¶
deployments.<deployment>.resources¶
A mapping that defines the resources that the container will use.
deployments.<deployment>.resources.cores¶
An integer number for specifying the amount of CPU cores for the container.
deployments.<deployment>.resources.ram¶
A string for specifying the amount of RAM for the container. Example: "2G"=2 gigabytes, "512M"=512 megabytes.
deployments.<deployment>.resources.gpu¶
An integer number for specifying the amount of GPU cores for the container.
deployments.<deployment>.replicas¶
An integer for specifying the replicas for a container.
Example: Creating a config with deployments¶
version: "2"
name: example_deployments
deployments:
- name: chimefrb_small_dep
site: chime
image: chimefrb/workflow:latest
sleep: 1
resources:
cores: 2
ram: 1G
replicas: 2
- name: chimefrb_medium_dep
site: chime
image: chimefrb/workflow:latest
sleep: 1
resources:
cores: 4
ram: 2G
replicas: 1
pipeline:
...
volumes¶
A list of mappings that define specifications for docker volumes that will be used by the deployments defined in the configuration.
volumes.<volume>.name¶
Name of the volume.
volumes.<volume>.type¶
Same as in docker, we can use different type of volume: volume, bind, tmpfs.
volumes.<volume>.target¶
Volume target inside the docker container generated by the deployment.
volumes.<volume>.source¶
Mount source (e.g. a volume name or a host path)
volumes.<volume>.driver_config¶
Volume driver configuration. Only valid for the volume type.
volumes.<volume>.options¶
Driver options as a key-value dictionary.
volumes.<volume>.internal¶
Restricts external access to the network.
networks¶
A mapping that define specifications for docker networks that will be used by the deployments defined in the configuration.
networks.<network_name>.driver¶
Name of the driver used to create the network.
networks.<network_name>.internal¶
If the network is already created
Example: Adding volumes to a deployment¶
version: "2"
name: example_generating_calibration
defaults:
user: projectoffice
site: chime
volumes:
calibration-data-vol-testing:
type: volume
target: /data
driver_config:
driver: local
driver_opts:
type: nfs
o: nfsvers=4.0,noatime,nodiratime,soft,addr=10.5.2.21,ro
device: ":/zfsPool0"
deployments:
- name: calibration-deployment
site: chime
image: chimefrb/frb-calibration:workflow_runner
sleep: 1
resources:
cores: 2
ram: 1G
replicas: 1
volumes:
- calibration-data-vol-testing (1)
pipeline:
...
volumes list in the deployment.
pipeline¶
The main focus on a workflow pipeline configuration is the pipeline which contains all the steps to be executed, these steps can be separated on stages, all the steps inside a stage will execute concurrently.
pipeline.runs_on¶
pipeline accepts the runs_on field when there are deployments defined on the configuration. This defines the default deployments on which the Work objects will be executed:
version: 2
name: runs-on-test
deployments:
- name: small_deployment_1
site: local
image: chimefrb/workflow:latest
sleep: 1
resources:
cores: 2
ram: 1G
replicas: 2
pipeline:
runs_on: small_deployment_1 # (1)
matrix:
name: ["user1", "user2"]
steps:
- name: hello-world
stage: 1
work:
command: ["echo", "hello", "${{ matrix.name }}"]
user: test
site: local
...
small_deployment_1.
pipeline.matrix¶
The pipeline.matrix is a tool that allow us to generate several Pipeline objects from one Config. It is a mapping were we define fields that we want to replace on our pipeline body. Example:
version: 2
name: matrix-test
pipeline:
matrix:
name: ["user1", "user2"]
steps:
- name: hello-world
stage: 1
work:
command: ["echo", "hello", "${{ matrix.name }}"]
user: test
site: local
This Config will generate 2 Pipeline objects, one for each value defined on matrix.name
pipeline.matrix.<field>.range¶
Any value inside a matrix can accept a range value that will let us define a python-like range:
version: 2
name: matrix-test
pipeline:
matrix:
id:
range: [1, 20, 2] # (1)
steps:
- name: check-ids
stage: 1
work:
command: ["echo", "checking id", "${{ matrix.id }}"]
user: test
site: local
pipeline.steps¶
steps is a required field for pipeline it consists of a list of objects in the following format:
Example: Creating steps on a pipeline¶
version: 2
name: "pipeline_test"
pipeline:
steps:
- name: task_1 # (1)
stage: 1
work:
...
- name: task_2
stage: 1
work:
...
- name: task_2_1
stage: 2
...
- Here
task_1is the name of the step and its value is the map containing the keysstageandwork.
pipeline.steps.<step>.name¶
Name of the current step. This will be passed to the Work.name field.
pipeline.steps.<step>.stage¶
This represents the order of execution on the pipeline configuration. e.g. In the example Creating steps on a pipeline there are 2 steps on the stage 1 and one step on the stages 2; a Work object will be created for all of the steps on stage 1 and the stage2 will only be executed if stage1 has completed succesfully.
pipeline.steps.<step>.work¶
All the keys under <step>.work will represent the parameters for a Work object. For example:
- name: baseband-localization # (1)
stage: 2
work: # (2)
parameters: # Work.parameters
path: # Work.path
event: # Work.event
tags: # Work.tags
function: # Work.function
...
- Step name.
- Fill this as if you were creating a
Workobject on console.
pipeline.steps.<step>.matrix¶
Just like in the top level matrix you can define a set of values that will be used for the step to generate Work objects depending on all the combinations of these values.
Example: Creating a matrix for a step¶
- name: stage_1_a
stage: 1
matrix: # (1)
event: [123456, 645123]
site: ["aro", "canfar"]
work: # (2)
site: ${{ matrix.site }}
command: ["ls", "${{ matrix.event }}"]
- This
matrixwill generate 4Workobjects, one per each combination:[123456, "aro"], [123456, "canfar"], [645123, "aro"], [645123, "canfar"] - The values will be replaced where the
matrix.<key_name>are specified on theWorkobject.
Note: You cannot use the same key on a top level matrix and on a inner matrix. This will raise an error on your pipeline configuration.
pipeline.steps.<step>.if¶
Use <step>.if when you need to specify conditions for a step to execute. Workflow pipelines keeps a context for all configurations that can be referenced to access several values of the pipeline configuration.
Example: Running a step depending on results of previous steps¶
version: "2"
name: "conditionals_tests"
pipeline:
- name: stage_1_step_1
stage: 1
work:
...
- name: stage_1_step_2
stage: 1
work:
...
- name: stage_2_step_1 # (1)
stage: 2
if: ${{ pipeline.stage_1_step_1.status == 'success' }}
...
- This step will only be executed if
stage_1_step_1executed successfully.
Example: Running a step using internal functions¶
version: "2"
name: "internals_tests"
pipeline:
- name: stage_1_step_1
stage: 1
work: ...
- name: stage_2_step_1
stage: 2
work: ...
- name: stage_success
if: success # (1)
work: ...
- name: stage_failure
if: failure # (2)
work: ...
- name: stage_always
if: always # (3)
work: ...
successwill be true only if the whole pipeline has the statussuccess.failurewill be true only if the whole pipeline has the statusfailure.alwayswill always beTrue.
pipeline.steps.<step>.runs_on¶
Just like the top level runs_on field, this will dictate in which of the defined deployments the Work object is going to be executed.
Example: Creating config with runs_on specifications¶
version: "2"
name: example_deployments
defaults:
user: test
site: local
deployments:
- name: ld1
site: local
image: workflow_local
sleep: 1
resources:
cores: 2
ram: 1G
replicas: 2
- name: ld2
site: local
sleep: 1
image: workflow_local
resources:
cores: 4
ram: 2G
replicas: 1
pipeline:
runs_on: ld1
steps:
- name: echo
stage: 1
work:
command: ["ls", "-lah"]
- name: uname
runs_on: ld2 # (1)
stage: 2
work:
command: ["uname", "-a"]
- name: printenv
stage: 3
work:
command: ["printenv"]
echo and printenv will execute on the pipeline runs_on deployment ld1, only the step uname will execute on the deployment ld2.
pipeline.steps.<step>.success_threshold¶
This key lets us define some percentage of success on a step allowing us to continue to the next step if X percentage of the replicas were successful. For example if we have a replicated step like the following:
Example: Using success_threshold on replicated step.¶
- name: replicated-task
stage: 1
matrix:
event:
range: [1, 1000]
work:
command: ["ls", "${{ matrix.event }}"]
Let's suppose that from the 1000 replicas generated, we only need 60% of them to be successful to advance to the next stage, we can use success_threshold for this:
- name: replicated-task
stage: 1
matrix:
event:
range: [1, 1000]
success_threshold: 0.6
work:
command: ["ls", "${{ matrix.event }}"]
Internal pipeline context¶
We can extend our configuration capabilities to use variables that will be only be available on runtime.
pipeline¶
We can use the pipeline context to access results from previous steps. When we use a reference from a previous step, Pipelines backend search for that value in its internal context. E.g:
version: "2"
name: example_reference
defaults:
user: test
site: local
pipeline:
steps:
- name: use-function
stage: 1
matrix:
a: [1.2, 2.5]
b: [5]
event:
- [11122, 11121]
work:
function: workflow.examples.function.math
event: ${{matrix.event}}
parameters:
alpha: ${{matrix.a}}
beta: ${{matrix.b}}
- name: use-results
stage: 2
matrix:
sum: ${{pipeline.use-function.results.sum}} (1)
product: ${{pipeline.use-function.results.product}}
work:
command: ["echo ${{matrix.sum}} && echo ${{matrix.product}}"]
- name: finish
if: success
stage: 3
work:
command: ["echo", "completed"]
- We can use results from previous steps using the format ${{pipeline.
.results. }}