Skip to content

YAML Reference

Pipeline configuration files use YAML syntax, in a .yml or .yaml file extension. We recommend naming the as workflow.*.yaml to make it easier to identify them for VSCode Language Server support arriving soon.

The configuration file is used to define the pipeline structure, the steps to be executed, and the deployments where the steps will run.

version

Version string to be used by the parser. E.g. "2".


name

Name for identifying the configuration.


defaults

Use defaults to create a map of default settings that will apply to all Work objects in the configuration.

defaults_example.yaml
version: "2"
name: "defaults_example"
defaults: # (1)
  user: "chimefrb"
  event: ["123456"]
  1. All Work objects created on this configuration will have user="chimefrb" and event=["123456"] as default values.

schedule

A mapping that specifies the schedule for the job. If this is not defined, the configuration will be read only once.

schedule.cronspec

A string that specifies the cron expression for when the job should run.

Example: Creating a pipeline with cronspec

cronspec_example.yaml
version: "2"
name: "cronspec_example"
schedule:
  cronspec: "30 4 * * 0" # (1)
pipeline: ...
  1. This pipeline configuration will run at 04:30 every Sunday.

schedule.lives

An integer that specifies how many times the job should be triggered by the cronspec. Default is 1. For triggering the job without lives count use -1.

Note: The schedule section requires cronspec and lives, if schedule is empty, it could lead to errors.

Example: Creating a config with cronspec and lives

cronspec_count_example.yaml
version: "2"
name: "cronspec_count_example"
schedule:
  cronspec: "30 4 * * 0"
  lives: 2 # (1)
pipeline: ...
  1. Same as the previous example, this will execute at 04:30 every Sunday but only 2 times. Once this limit has been reached, the Schedule status will be expired.

deployments

A list of mappings that define specifications for docker containers in which the configuration (or a set of specific steps) will run. When no deployments are specified, the Work objects generated by the pipeline will wait in Buckets until a worker grabs them and executes them.

deployments.<deployment>.name

A string used to identify the deployment.

deployments.<deployment>.site

A string that defines were the container will be deployed. The accepted sites for a pipelines installation can vary since they are defined on the pipeline workspace. #TODO add link to workspace documentation

deployments.<deployment>.image

Name and tag of the docker image that will be used to build the container.

deployments.<deployment>.sleep

deployments.<deployment>.resources

A mapping that defines the resources that the container will use.

deployments.<deployment>.resources.cores

An integer number for specifying the amount of CPU cores for the container.

deployments.<deployment>.resources.ram

A string for specifying the amount of RAM for the container. Example: "2G"=2 gigabytes, "512M"=512 megabytes.

deployments.<deployment>.resources.gpu

An integer number for specifying the amount of GPU cores for the container.

deployments.<deployment>.replicas

An integer for specifying the replicas for a container.

Example: Creating a config with deployments

YAML
version: "2"
name: example_deployments

deployments:
  - name: chimefrb_small_dep
    site: chime
    image: chimefrb/workflow:latest
    sleep: 1
    resources:
      cores: 2
      ram: 1G
    replicas: 2
  - name: chimefrb_medium_dep
    site: chime
    image: chimefrb/workflow:latest
    sleep: 1
    resources:
      cores: 4
      ram: 2G
    replicas: 1

pipeline:
  ...

pipeline

The main focus on a workflow pipeline configuration is the pipeline which contains all the steps to be executed, these steps can be separated on stages, all the steps inside a stage will execute concurrently.

pipeline.runs_on

pipeline accepts the runs_on field when there are deployments defined on the configuration. This defines the default deployments on which the Work objects will be executed:

YAML
version: 2
name: runs-on-test

deployments:
  - name: small_deployment_1
    site: local
    image: chimefrb/workflow:latest
    sleep: 1
    resources:
      cores: 2
      ram: 1G
    replicas: 2


pipeline:
  runs_on: small_deployment_1 # (1)
  matrix:
    name: ["user1", "user2"]
  steps:
    - name: hello-world
      stage: 1
      work:
        command: ["echo", "hello", "${{ matrix.name }}"]
        user: test
        site: local
    ...
1. All of the steps will execute on the deployment small_deployment_1.

pipeline.matrix

The pipeline.matrix is a tool that allow us to generate several Pipeline objects from one Config. It is a mapping were we define fields that we want to replace on our pipeline body. Example:

YAML
version: 2
name: matrix-test

pipeline:
  matrix:
    name: ["user1", "user2"]
  steps:
    - name: hello-world
      stage: 1
      work:
        command: ["echo", "hello", "${{ matrix.name }}"]
        user: test
        site: local

This Config will generate 2 Pipeline objects, one for each value defined on matrix.name

pipeline.matrix.<field>.range

Any value inside a matrix can accept a range value that will let us define a python-like range:

YAML
version: 2
name: matrix-test

pipeline:
  matrix:
    id:
      range: [1, 20, 2] # (1)
  steps:
    - name: check-ids
      stage: 1
      work:
        command: ["echo", "checking id", "${{ matrix.id }}"]
        user: test
        site: local
1. This will generate a list of integers from 1 to 20 in steps of 2.

pipeline.steps

steps is a required field for pipeline it consists of a list of objects in the following format:

Example: Creating steps on a pipeline

YAML
version: 2
name: "pipeline_test"
pipeline:
  steps:
    - name: task_1 # (1)
      stage: 1
      work:
        ...
    - name: task_2
      stage: 1
      work:
        ...
    - name: task_2_1
      stage: 2
      ...
  1. Here task_1 is the name of the step and its value is the map containing the keys stage and work.

pipeline.steps.<step>.name

Name of the current step. This will be passed to the Work.name field.

pipeline.steps.<step>.stage

This represents the order of execution on the pipeline configuration. e.g. In the example Creating steps on a pipeline there are 2 steps on the stage 1 and one step on the stages 2; a Work object will be created for all of the steps on stage 1 and the stage2 will only be executed if stage1 has completed succesfully.

pipeline.steps.<step>.work

All the keys under <step>.work will represent the parameters for a Work object. For example:

YAML
  - name: baseband-localization # (1)
    stage: 2
    work: # (2)
      parameters: # Work.parameters
      path: # Work.path
      event: # Work.event
      tags: # Work.tags
      function: # Work.function
      ...
  1. Step name.
  2. Fill this as if you were creating a Work object on console.

pipeline.steps.<step>.matrix

Just like in the top level matrix you can define a set of values that will be used for the step to generate Work objects depending on all the combinations of these values.

Example: Creating a matrix for a step

inner_matrix_example.yaml
- name: stage_1_a
  stage: 1
  matrix: # (1)
    event: [123456, 645123]
    site: ["aro", "canfar"]
  work: # (2)
    site: ${{ matrix.site }}
    command: ["ls", "${{ matrix.event }}"]
  1. This matrix will generate 4 Work objects, one per each combination: [123456, "aro"], [123456, "canfar"], [645123, "aro"], [645123, "canfar"]
  2. The values will be replaced where the matrix.<key_name> are specified on the Work object.

Note: You cannot use the same key on a top level matrix and on a inner matrix. This will raise an error on your pipeline configuration.

pipeline.steps.<step>.if

Use <step>.if when you need to specify conditions for a step to execute. Workflow pipelines keeps a context for all configurations that can be referenced to access several values of the pipeline configuration.

Example: Running a step depending on results of previous steps

conditionals_example.yaml
version: "2"
name: "conditionals_tests"
pipeline:
  - name: stage_1_step_1
    stage: 1
    work:
      ...
  - name: stage_1_step_2
    stage: 1
    work:
      ...
  - name: stage_2_step_1 # (1)
    stage: 2
    if: ${{ pipeline.stage_1_step_1.status == 'success' }}
    ...
  1. This step will only be executed if stage_1_step_1 executed successfully.

Example: Running a step using internal functions

internals_example.yaml
version: "2"
name: "internals_tests"
pipeline:
  - name: stage_1_step_1
    stage: 1
    work: ...
  - name: stage_2_step_1
    stage: 2
    work: ...
  - name: stage_success
    if: success # (1)
    work: ...
  - name: stage_failure
    if: failure # (2)
    work: ...
  - name: stage_always
    if: always # (3)
    work: ...
  1. success will be true only if the whole pipeline has the status success.
  2. failure will be true only if the whole pipeline has the status failure.
  3. always will always be True.

pipeline.steps.<step>.runs_on

Just like the top level runs_on field, this will dictate in which of the defined deployments the Work object is going to be executed.

Example: Creating config with runs_on specifications

YAML
version: "2"
name: example_deployments
defaults:
  user: test
  site: local

deployments:
  - name: ld1
    site: local
    image: workflow_local
    sleep: 1
    resources:
      cores: 2
      ram: 1G
    replicas: 2
  - name: ld2
    site: local
    sleep: 1
    image: workflow_local
    resources:
      cores: 4
      ram: 2G
    replicas: 1

pipeline:
  runs_on: ld1
  steps:
    - name: echo
      stage: 1
      work:
        command: ["ls", "-lah"]
    - name: uname
      runs_on: ld2 # (1)
      stage: 2
      work:
        command: ["uname", "-a"]
    - name: printenv
      stage: 3
      work:
        command: ["printenv"]
1. While the steps echo and printenv will execute on the pipeline runs_on deployment ld1, only the step uname will execute on the deployment ld2.


pipeline.steps.<step>.success_threshold

This key lets us define some percentage of success on a step allowing us to continue to the next step if X percentage of the replicas were successful. For example if we have a replicated step like the following:

Example: Using success_threshold on replicated step.

replicated.yaml
- name: replicated-task
  stage: 1
  matrix:
    event:
      range: [1, 1000]
  work:
    command: ["ls", "${{ matrix.event }}"]

Let's suppose that from the 1000 replicas generated, we only need 60% of them to be successful to advance to the next stage, we can use success_threshold for this:

success_threshold.yaml
- name: replicated-task
  stage: 1
  matrix:
    event:
      range: [1, 1000]
  success_threshold: 0.6
  work:
    command: ["ls", "${{ matrix.event }}"]

Internal pipeline context

We can extend our configuration capabilities to use variables that will be only be available on runtime.

reference

reference is a mapping that is used to grab values from previous steps and use it on the current one. It works with placeholders just like matrix that means instead of using matrix.<some_value> we would use reference.<some_value>.

Technically in reference we could store values like in matrix to be used on the step's work definition but the best way to use it is along matrix, for example:

In this configuration, the step use-function will generate 2 work objects which values will be used on the use-results step. As we need the results of several Work objects, the placeholders ${{pipeline.use-function-results.sum}} and ${{pipeline.use-function.results.product}} will both be a list. This list can be used in matrix so we replicate the use-results step based on that.

YAML
version: "2"
name: example_reference
defaults:
  user: test
  site: local

pipeline:
  steps:
    - name: use-function
      stage: 1
      matrix:
        a: [1.2, 2.5]
        b: [5]
        event:
          - [11122, 11121]
      work:
        function: workflow.examples.function.math
        event: ${{matrix.event}}
        parameters:
          alpha: ${{matrix.a}}
          beta: ${{matrix.b}}
    - name: use-results
      stage: 2
      reference: # (1)
        sum: ${{pipeline.use-function.work.results.sum}} # (2)
        product: ${{pipeline.use-function.work.results.product}}
      matrix:
        sum: ${{reference.sum}}
        product: ${{reference.product}}
      work:
        command: ["echo ${{matrix.sum}} && echo ${{matrix.product}}"] # (3)
    - name: finish
      if: success
      stage: 3
      work:
        command: ["echo", "completed"]
  1. We define reference when we want to use values from previous steps.
  2. Take the sum result from step use-function.
  3. Use the reference with the appropiate syntax.