YAML Reference¶

Pipeline configuration files use YAML syntax, in a .yml or .yaml file extension. We recommend naming the as workflow.*.yaml to make it easier to identify them for VSCode Language Server support arriving soon.

The configuration file is used to define the pipeline structure, the steps to be executed, and the deployments where the steps will run.

`version`¶

Version string to be used by the parser. E.g. "2".

`name`¶

Name for identifying the configuration.

`defaults`¶

Use defaults to create a map of default settings that will apply to all Work objects in the configuration.

defaults_example.yaml

version: "2"
name: "defaults_example"
defaults: # All Work objects created on this configuration will have user="chimefrb" and event=["123456"] as default values.

  user: "chimefrb"
  event: ["123456"]

`schedule`¶

A mapping that specifies the schedule for the job. If this is not defined, the configuration will be read only once.

`schedule.cronspec`¶

A string that specifies the cron expression for when the job should run.

Example: Creating a pipeline with cronspec¶

cronspec_example.yaml

version: "2"
name: "cronspec_example"
schedule:
  cronspec: "30 4 * * 0" # This pipeline configuration will run at 04:30 every Sunday.

pipeline: ...

`schedule.lives`¶

An integer that specifies how many times the job should be triggered by the cronspec. Default is 1. For triggering the job without lives count use -1.

Note: The schedule section requires cronspec and lives, if schedule is empty, it could lead to errors.

Example: Creating a config with cronspec and lives¶

cronspec_count_example.yaml

version: "2"
name: "cronspec_count_example"
schedule:
  cronspec: "30 4 * * 0"
  lives: 2 # Same as the previous example, this will execute at 04:30 every Sunday but only 2 times. Once this limit has been reached, the Schedule status will be expired.

pipeline: ...

`deployments`¶

A list of mappings that define specifications for docker containers in which the configuration (or a set of specific steps) will run. When no deployments are specified, the Work objects generated by the pipeline will wait in Buckets until a worker grabs them and executes them.

`deployments.<deployment>.name`¶

A string used to identify the deployment.

`deployments.<deployment>.site`¶

A string that defines were the container will be deployed. The accepted sites for a pipelines installation can vary since they are defined on the pipeline workspace. #TODO add link to workspace documentation

`deployments.<deployment>.image`¶

Name and tag of the docker image that will be used to build the container.

`deployments.<deployment>.sleep`¶

`deployments.<deployment>.resources`¶

A mapping that defines the resources that the container will use.

`deployments.<deployment>.resources.cores`¶

An integer number for specifying the amount of CPU cores for the container.

`deployments.<deployment>.resources.ram`¶

A string for specifying the amount of RAM for the container. Example: "2G"=2 gigabytes, "512M"=512 megabytes.

`deployments.<deployment>.resources.gpu`¶

An integer number for specifying the amount of GPU cores for the container.

`deployments.<deployment>.replicas`¶

An integer for specifying the replicas for a container.

Example: Creating a config with deployments¶

YAML

version: "2"
name: example_deployments

deployments:
  - name: chimefrb_small_dep
    site: chime
    image: chimefrb/workflow:latest
    sleep: 1
    resources:
      cores: 2
      ram: 1G
    replicas: 2
  - name: chimefrb_medium_dep
    site: chime
    image: chimefrb/workflow:latest
    sleep: 1
    resources:
      cores: 4
      ram: 2G
    replicas: 1

pipeline:
  ...

`volumes`¶

A list of mappings that define specifications for docker volumes that will be used by the deployments defined in the configuration.

`volumes.<volume>.name`¶

Name of the volume.

`volumes.<volume>.type`¶

Same as in docker, we can use different type of volume: volume, bind, tmpfs.

`volumes.<volume>.target`¶

Volume target inside the docker container generated by the deployment.

`volumes.<volume>.source`¶

Mount source (e.g. a volume name or a host path)

`volumes.<volume>.driver_config`¶

Volume driver configuration. Only valid for the volume type.

`volumes.<volume>.options`¶

Driver options as a key-value dictionary.

`volumes.<volume>.internal`¶

Restricts external access to the network.

`networks`¶

A mapping that define specifications for docker networks that will be used by the deployments defined in the configuration.

`networks.<network_name>.driver`¶

Name of the driver used to create the network.

`networks.<network_name>.internal`¶

If the network is already created

Example: Adding volumes to a deployment¶

YAML

version: "2"
name: example_generating_calibration
defaults:
  user: projectoffice
  site: chime

volumes:
  calibration-data-vol-testing:
    type: volume
    target: /data
    driver_config:
      driver: local
      driver_opts:
        type: nfs
        o: nfsvers=4.0,noatime,nodiratime,soft,addr=10.5.2.21,ro
        device: ":/zfsPool0"

deployments:
  - name: calibration-deployment
    site: chime
    image: chimefrb/frb-calibration:workflow_runner
    sleep: 1
    resources:
      cores: 2
      ram: 1G
    replicas: 1
    volumes:
      - calibration-data-vol-testing (1)

pipeline:
  ...

1. Add the name of the volume to the volumes list in the deployment.

`pipeline`¶

The main focus on a workflow pipeline configuration is the pipeline which contains all the steps to be executed, these steps can be separated on stages, all the steps inside a stage will execute concurrently.

`pipeline.runs_on`¶

pipeline accepts the runs_on field when there are deployments defined on the configuration. This defines the default deployments on which the Work objects will be executed:

YAML

version: 2
name: runs-on-test

deployments:
  - name: small_deployment_1
    site: local
    image: chimefrb/workflow:latest
    sleep: 1
    resources:
      cores: 2
      ram: 1G
    replicas: 2


pipeline:
  runs_on: small_deployment_1 # (1)
  matrix:
    name: ["user1", "user2"]
  steps:
    - name: hello-world
      stage: 1
      work:
        command: ["echo", "hello", "${{ matrix.name }}"]
        user: test
        site: local
    ...

1. All of the steps will execute on the deployment small_deployment_1.

`pipeline.matrix`¶

The pipeline.matrix is a tool that allow us to generate several Pipeline objects from one Config. It is a mapping were we define fields that we want to replace on our pipeline body. Example:

YAML

version: 2
name: matrix-test

pipeline:
  matrix:
    name: ["user1", "user2"]
  steps:
    - name: hello-world
      stage: 1
      work:
        command: ["echo", "hello", "${{ matrix.name }}"]
        user: test
        site: local

This Config will generate 2 Pipeline objects, one for each value defined on matrix.name

`pipeline.matrix.<field>.range`¶

Any value inside a matrix can accept a range value that will let us define a python-like range:

YAML

version: 2
name: matrix-test

pipeline:
  matrix:
    id:
      range: [1, 20, 2] # (1)
  steps:
    - name: check-ids
      stage: 1
      work:
        command: ["echo", "checking id", "${{ matrix.id }}"]
        user: test
        site: local

1. This will generate a list of integers from 1 to 20 in steps of 2.

`pipeline.steps`¶

steps is a required field for pipeline it consists of a list of objects in the following format:

Example: Creating steps on a pipeline¶

YAML

version: 2
name: "pipeline_test"
pipeline:
  steps:
    - name: task_1 # Here task_1 is the name of the step and its value is the map containing the keys stage and work.

      stage: 1
      work:
        ...
    - name: task_2
      stage: 1
      work:
        ...
    - name: task_2_1
      stage: 2
      ...

`pipeline.steps.<step>.name`¶

Name of the current step. This will be passed to the Work.name field.

`pipeline.steps.<step>.stage`¶

This represents the order of execution on the pipeline configuration. e.g. In the example Creating steps on a pipeline there are 2 steps on the stage 1 and one step on the stages 2; a Work object will be created for all of the steps on stage 1 and the stage2 will only be executed if stage1 has completed succesfully.

`pipeline.steps.<step>.work`¶

All the keys under <step>.work will represent the parameters for a Work object. For example:

YAML

  - name: baseband-localization # Step name.

    stage: 2
    work: # Fill this as if you were creating a Work object on console.

      parameters: # Work.parameters
      path: # Work.path
      event: # Work.event
      tags: # Work.tags
      function: # Work.function
      ...

`pipeline.steps.<step>.matrix`¶

Just like in the top level matrix you can define a set of values that will be used for the step to generate Work objects depending on all the combinations of these values.

Example: Creating a matrix for a step¶

inner_matrix_example.yaml

- name: stage_1_a
  stage: 1
  matrix: # This matrix will generate 4 Work objects, one per each combination: [123456, "aro"], [123456, "canfar"], [645123, "aro"], [645123, "canfar"]

    event: [123456, 645123]
    site: ["aro", "canfar"]
  work: # The values will be replaced where the matrix.<key_name> are specified on the Work object.

    site: ${{ matrix.site }}
    command: ["ls", "${{ matrix.event }}"]

Note: You cannot use the same key on a top level matrix and on a inner matrix. This will raise an error on your pipeline configuration.

`pipeline.steps.<step>.if`¶

Use <step>.if when you need to specify conditions for a step to execute. Workflow pipelines keeps a context for all configurations that can be referenced to access several values of the pipeline configuration.

Example: Running a step depending on results of previous steps¶

conditionals_example.yaml

version: "2"
name: "conditionals_tests"
pipeline:
  - name: stage_1_step_1
    stage: 1
    work:
      ...
  - name: stage_1_step_2
    stage: 1
    work:
      ...
  - name: stage_2_step_1 # This step will only be executed if stage_1_step_1 executed successfully.

    stage: 2
    if: ${{ pipeline.stage_1_step_1.status == 'success' }}
    ...

Example: Running a step using internal functions¶

internals_example.yaml

version: "2"
name: "internals_tests"
pipeline:
  - name: stage_1_step_1
    stage: 1
    work: ...
  - name: stage_2_step_1
    stage: 2
    work: ...
  - name: stage_success
    if: success # success will be true only if the whole pipeline has the status success.

    work: ...
  - name: stage_failure
    if: failure # failure will be true only if the whole pipeline has the status failure.

    work: ...
  - name: stage_always
    if: always # always will always be True.

    work: ...

`pipeline.steps.<step>.runs_on`¶

Just like the top level runs_on field, this will dictate in which of the defined deployments the Work object is going to be executed.

Example: Creating config with runs_on specifications¶

YAML

version: "2"
name: example_deployments
defaults:
  user: test
  site: local

deployments:
  - name: ld1
    site: local
    image: workflow_local
    sleep: 1
    resources:
      cores: 2
      ram: 1G
    replicas: 2
  - name: ld2
    site: local
    sleep: 1
    image: workflow_local
    resources:
      cores: 4
      ram: 2G
    replicas: 1

pipeline:
  runs_on: ld1
  steps:
    - name: echo
      stage: 1
      work:
        command: ["ls", "-lah"]
    - name: uname
      runs_on: ld2 # (1)
      stage: 2
      work:
        command: ["uname", "-a"]
    - name: printenv
      stage: 3
      work:
        command: ["printenv"]

1. While the steps echo and printenv will execute on the pipeline runs_on deployment ld1, only the step uname will execute on the deployment ld2.

`pipeline.steps.<step>.success_threshold`¶

This key lets us define some percentage of success on a step allowing us to continue to the next step if X percentage of the replicas were successful. For example if we have a replicated step like the following:

Example: Using success_threshold on replicated step.¶

replicated.yaml

- name: replicated-task
  stage: 1
  matrix:
    event:
      range: [1, 1000]
  work:
    command: ["ls", "${{ matrix.event }}"]

Let's suppose that from the 1000 replicas generated, we only need 60% of them to be successful to advance to the next stage, we can use success_threshold for this:

success_threshold.yaml

- name: replicated-task
  stage: 1
  matrix:
    event:
      range: [1, 1000]
  success_threshold: 0.6
  work:
    command: ["ls", "${{ matrix.event }}"]

Internal pipeline context¶

We can extend our configuration capabilities to use variables that will be only be available on runtime.

`pipeline`¶

We can use the pipeline context to access results from previous steps. When we use a reference from a previous step, Pipelines backend search for that value in its internal context. E.g:

YAML

version: "2"
name: example_reference
defaults:
  user: test
  site: local

pipeline:
  steps:
    - name: use-function
      stage: 1
      matrix:
        a: [1.2, 2.5]
        b: [5]
        event:
          - [11122, 11121]
      work:
        function: workflow.examples.function.math
        event: ${{matrix.event}}
        parameters:
          alpha: ${{matrix.a}}
          beta: ${{matrix.b}}
    - name: use-results
      stage: 2
      matrix:
        sum: ${{pipeline.use-function.results.sum}} (1)
        product: ${{pipeline.use-function.results.product}}
      work:
        command: ["echo ${{matrix.sum}} && echo ${{matrix.product}}"]
    - name: finish
      if: success
      stage: 3
      work:
        command: ["echo", "completed"]

We can use results from previous steps using the format ${{pipeline..results.}}

YAML Reference¶

version¶

name¶

defaults¶

schedule¶

schedule.cronspec¶

Example: Creating a pipeline with cronspec¶

schedule.lives¶

Example: Creating a config with cronspec and lives¶

deployments¶

deployments.<deployment>.name¶

deployments.<deployment>.site¶

deployments.<deployment>.image¶

deployments.<deployment>.sleep¶

deployments.<deployment>.resources¶

deployments.<deployment>.resources.cores¶

deployments.<deployment>.resources.ram¶

deployments.<deployment>.resources.gpu¶

deployments.<deployment>.replicas¶

Example: Creating a config with deployments¶

volumes¶

volumes.<volume>.name¶

volumes.<volume>.type¶

volumes.<volume>.target¶

volumes.<volume>.source¶

volumes.<volume>.driver_config¶

volumes.<volume>.options¶

volumes.<volume>.internal¶

networks¶

networks.<network_name>.driver¶

networks.<network_name>.internal¶

Example: Adding volumes to a deployment¶

pipeline¶

pipeline.runs_on¶

pipeline.matrix¶

pipeline.matrix.<field>.range¶

pipeline.steps¶

Example: Creating steps on a pipeline¶

pipeline.steps.<step>.name¶

pipeline.steps.<step>.stage¶

pipeline.steps.<step>.work¶

pipeline.steps.<step>.matrix¶

Example: Creating a matrix for a step¶

pipeline.steps.<step>.if¶

Example: Running a step depending on results of previous steps¶

Example: Running a step using internal functions¶

pipeline.steps.<step>.runs_on¶

Example: Creating config with runs_on specifications¶

pipeline.steps.<step>.success_threshold¶

Example: Using success_threshold on replicated step.¶

Internal pipeline context¶

pipeline¶

`version`¶

`name`¶

`defaults`¶

`schedule`¶

`schedule.cronspec`¶

`schedule.lives`¶

`deployments`¶

`deployments.<deployment>.name`¶

`deployments.<deployment>.site`¶

`deployments.<deployment>.image`¶

`deployments.<deployment>.sleep`¶

`deployments.<deployment>.resources`¶

`deployments.<deployment>.resources.cores`¶

`deployments.<deployment>.resources.ram`¶

`deployments.<deployment>.resources.gpu`¶

`deployments.<deployment>.replicas`¶

`volumes`¶

`volumes.<volume>.name`¶

`volumes.<volume>.type`¶

`volumes.<volume>.target`¶

`volumes.<volume>.source`¶

`volumes.<volume>.driver_config`¶

`volumes.<volume>.options`¶

`volumes.<volume>.internal`¶

`networks`¶

`networks.<network_name>.driver`¶

`networks.<network_name>.internal`¶

`pipeline`¶

`pipeline.runs_on`¶

`pipeline.matrix`¶

`pipeline.matrix.<field>.range`¶

`pipeline.steps`¶

`pipeline.steps.<step>.name`¶

`pipeline.steps.<step>.stage`¶

`pipeline.steps.<step>.work`¶

`pipeline.steps.<step>.matrix`¶

`pipeline.steps.<step>.if`¶

`pipeline.steps.<step>.runs_on`¶

`pipeline.steps.<step>.success_threshold`¶

`pipeline`¶