Skip to content

Workflow Tutorial

Installation

Step 1: Clone the Repository

First, clone the workflow repository:

git clone https://github.com/CHIMEFRB/workflow.gitcd workflow

Step 2: Install dependencies with Poetry

Install all depedencies needed to run workflow using the following command:

poetry install

This will allow you to call workflow from the command line.

Step 3: Check your workspace

All services on the workflow ecosystem need a workspace to function correctly. A workspace is a YAML file that defines several configurations like base URLs to use, deployers, Work.config values, archive locations, sites, etc.

This tutorial will be using the development.yml workspace located on the workflow/workspaces/ folder of the workflow repository:

YAML
workspace: development

# List the valid sites for this workspace
sites:
  - local

http:
  baseurls:
    configs: http://localhost:8001/v2
    pipelines: http://localhost:8001/v2
    schedules: http://localhost:8001/v2
    buckets: http://localhost:8004
    results: http://localhost:8005

archive:
  mounts:
    local: "/"

config:
  archive:
    plots:
      methods:
        - "bypass"
        - "copy"
        - "delete"
        - "move"
      storage: "s3"
    products:
      methods:
        - "bypass"
        - "copy"
        - "delete"
        - "move"
      storage: "posix"
    results: true
    permissions:
      posix:
        user: "user"
        group: "chime-frb-ro"
        command: "setfacl -R -m g:{group}:r {path}"

deployers:
  local:
    docker:
      client_url: unix:///var/run/docker.sock
      networks:
        workflow: # This network maps to the docker-compose.yml network name

You would need to set the development workspace so workflow knows where the service's backends are located:

poetry run workflow workspace set developmentThere is not active workspace
Locating workspace development
Reading ./workflow/workspaces/development.yml
Workspace development set to active.

Step 3: Launch the Required Services

To launch all services required for this tutorial, you need to launch the docker-compose.yml:

docker-compose up -d --build

This will launch the following services:

  • Buckets: Queue of Work objects.
  • Results : Work results storage.
  • mongo: MongoDB used by all services.
  • Pipelines API : Pipelines server that workflow will talk to.
  • Pipelines Managers: Managers server in charge of monitoring tasks.

Creating Your First Config

Now, let’s create your first config:

From the workflow folder, create a file named hello-world-config.yaml and add the following content:

YAML
version: "2"
name: hello-world-test

pipeline:
  steps:
    - name: hello-world
      stage: 1
      work:
        command: ["echo", "hello", "world"]
        user: test
        site: local

In this example, we have only one step that contains a Work object to be executed in the first stage. It runs a single bash command echo "hello world". This is the most basic configuration you can create.

The main component of a configuration is the pipeline. A pipeline consists on a list of steps for which we can define the order of execution using the stage key.


Deposit a Configuration

To send your configuration to workflow, you have to use the following command:

poetry run workflow configs deploy hello-world-config.yml
Workflow Configs
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Deploy Result ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ config: 667c2806a0e2440586c3cfdb │
│ pipelines: │
│ 667c2806a0e2440586c3cfda │
│ │
└────────────────────────────────────────────────┘

This command will return the IDs for the Config object deposited and for the Pipeline objects created based on the configuration.

The Work object created by this Config will stay on Buckets until some worker takes it and executes it.

We can act as such worker from our machine, executing the work object locally with the following command:

poetry run workflow run hello-world --site local

This command will search for a Work object named hello-world from site local and execute it.


Check the status of your pipelines

We can monitor the actual status of our pipelines from the workflow cli. For this we only need the Config ID:

poetry run workflow configs ps hello-world-test 667c2806a0e2440586c3cfdb
Workflow Configs
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Config: hello-world-test ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ id: 667c2806a0e2440586c3cfdb │
│ version: 1 │
│ pipelines: │
│ 667c2806a0e2440586c3cfda: ✅ │
│ deployments: None │
│ user: test │
│ │
├─────────────────────────────────────────────────────────────┤
│ Explore pipelines in detail: │
│ workflow pipelines ps <config_name> <pipeline_id> │
└─────────────────────────────────────────────────────────────┘

We can see that our pipeline has a green check that means it was completed. To check the details of the pipelines we can follow the example at the bottom of the config ps output.

All pipelines generated by a Config will be located under that Config.name, for example, out config is named hello-world-test so all of the pipelines generated by this config (in this case just one) need to be queried using that name:

poetry run workflow pipelines ps hello-world-test 667c2806a0e2440586c3cfda
Workflow Pipelines
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Pipeline: hello-world-test ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ id: 667c2806a0e2440586c3cfda │
│ current_stage: 1 │
│ status: success │
│ creation: 2024-06-26 10:39:02.276542 │
│ start: 2024-06-26 10:39:02.524662 │
│ stop: 2024-06-26 10:40:02.855216 │
│ steps: │
│ hello-world:✅ │
│ │
└────────────────────────────────────────────────┘

Using defaults

defaults is a QoL feature that allow us to define a set of parameters for all of the Work objects defined on our pipeline in a single place.

For example, let's say that our pipeline contains 100 steps. Now, the Work object needs to know the user and the site but it can become tedious to write those values every time we add Work to our pipeline, for this we can use defaults:

YAML
version: "2"
name: files-test
defaults:
  user: "test"
  site: "local"  #(1)
  tags: ["tag1"]

pipeline:
  steps:
    - name: create-file
      stage: 1
      work:
        command: ["touch", "test.py", ";", "echo", "print('hello')", ">", "test.py"]
    - name: run-file
      stage: 2
      work:
        command: ["python test.py"]
    - name: finish
      stage: 3
      work:
        command: ["echo Finished"]
  1. The values defined on defaults will be applied to all of the Work objects in the pipeline.

Extending the Work payload

As we defined previously, every step contains a work payload and this payload accepts all of the parameters that the Work object accepts.

YAML
version: "2"
name: tags-test
defaults:
  user: "test"
  site: "local"  #(1)
  tags: ["tag1"]

pipeline:
  steps:
    - name: sum
      stage: 1
      work:
        function: workflow.examples.function.math #(1)
        event: [594688]
        parameters:
          alpha: 1
          beta: 5
        notify:
          slack:
            ...
        config:
          archive:
            results: true
            products: "copy"
            plots: "move"
        priority: 4
  1. This is an example function, provided on the workflow repo.

Adding deployments

Until now, we have been running the command poetry run workflow run <buckets> to execute the Work objects that we define on our configurations but workflow allows you to automate even this by defining deployments on your configuration file:

YAML
version: "2"
name: example_deployments
defaults:
  user: test
  site: local

deployments:
  - name: ld1 # (1)
    site: local
    image: chimefrb/workflow:latest
    sleep: 1
    resources:
      cores: 2
      ram: 1G
    replicas: 2
  - name: ld2
    site: local
    image: chimefrb/workflow:latest
    sleep: 1
    resources:
      cores: 1
      ram: 512M
    replicas: 1

pipeline:
  runs_on: ld1 # (2)
  steps:
    - name: echo
      stage: 1
      runs_on: ld2
      work:
        command: ["ls", "-lah"]
    - name: uname
      runs_on: ld2 # (3)
      stage: 2
      work:
        command: ["uname", "-a"]
    - name: printenv
      stage: 3
      work:
        command: ["printenv"]
  1. Deployment definition. This has to be used on our pipeline or on a step.
  2. In this example we add the deployment to the pipeline using the key runs_on. This means all pipeline's steps will run on this container.
  3. We can add a deployment to a step, also using the runs_on, this will make the step run only on the specified deployment and ignore the deployment inherited from the pipeline.

A configuration can have one or more deployments and these deployments have to be used either on the pipeline or in a step. A deployment is just a mapping that will instruct pipelines backend to create a docker container and run the specified object in it.


Pipeline and step replication

When creating a configuration we can define a matrix strategy on our pipeline or on our steps, this matrix will replicate the object depending on the values that we define inside of it.

A matrix is a mapping were each key can have a list of values, pipelines will determine all of the possible combinations for the values to replicate one object:

YAML
pipeline:
  matrix:
    alpha: [1, 2, 3]
  steps:
    - name: replicated_step
      stage: 1
      function: workflow.examples.function.math
      parameters:
        alpha: 5
        beta: ${{matrix.alpha}} # (1)
  1. In this example we define a matrix for the pipeline (top level), this matrix will make the pipeline to replicate 3 times (once for each value defined on the alpha key) and will replace the values where we use the replace syntax (${{matrix.}})

We can also apply a matrix strategy to each of our steps:

YAML
version: 1
name: example_matrix_steps
defaults:
  user: test
  site: local

pipeline:
  steps:
    - name: check-pids
      stage: 1
      matrix:
        pid: [256, 554, 2849]
      work:
        command: ["ps", "-p", "${{matrix.pid}}"]

Running steps with conditions

When creating a Config we may want to execute some steps depending on conditions. pipelines backend provides simple but useful conditions that can enhance your workflow. To add a condition to a specific step you need to use the if key.

  • success: Add this condition to a step so it runs only if the pipeline is being successful.
  • failure: This condition will make the step to run only if the pipeline fails.
  • always: Use this condition to run the step regardless of the pipeline state.
YAML
version: "2"
name: example
defaults:
  user: test
  site: local

pipeline:
  steps:
    - name: alpha
      stage: 1
      matrix:
        alpha:
          range: 
      work:
        function: function.that.will.fail
      success_threshold: 0.7
    - name: Cleanup failed task
      if: failure() # (1)
      stage: 2
      work:
        command: ["echo", "cleaning"]
  1. Will run since the step alpha will fail.

Adding success threshold

When using a matrix on a step, we may want to add a success threshold to define what's the amount of successful steps that we need to proceed to the next stage. We can achieve this adding the success_threshold to our step with matrix.

YAML
version: "2"
name: example_success_threshold
defaults:
  user: test
  site: local

pipeline:
  steps:
    - name: alpha
      matrix: # (1)
        ranged_value:
          range: [100, 110]
        message: ["message1", "message2"]
      stage: 1
      work:
        command: ["echo", "${{ matrix.message }}", "${{ matrix.ranged_value }}"]
      success_threshold: 0.7 # (2)
    - name: beta
      stage: 2
      work:
        command: ["echo", "succeed"]
  1. Matrix definition that will replicate the step alpha 30 times (10 values from ranged_value times 3 values from message)
  2. We have defined a success threshold of 0.7, meaning that only 70% of the steps have to succeed in order to pass to the next stage.

Referencing results from previous step

When creating more complex configurations, we may want to pass results from previous steps to the next, you can use the syntax: ${{ pipeline.<step-name>.results.<field>}} and save this in the step.reference:

YAML
version: "2"
name: example_reference
defaults:
  user: test
  site: local

pipeline:
  steps:
    - name: use-function
      stage: 1
      matrix:
        a: [1.2, 2.5]
        b: [5]
        event:
          - [11122, 11121]
      work:
        function: workflow.examples.function.math
        event: ${{matrix.event}}
        parameters:
          alpha: ${{matrix.a}}
          beta: ${{matrix.b}}
    - name: use-results
      stage: 2
      reference: # (1)
        sum: ${{pipeline.use-function.work.results.sum}} # (2)
        product: ${{pipeline.use-function.work.results.product}}
      matrix:
        sum: ${{reference.sum}}
        product: ${{reference.product}}
      work:
        command: ["echo ${{matrix.sum}} && echo ${{matrix.product}}"] # (3)
  1. We define reference when we want to use values from previous steps.
  2. Take the sum result from step use-function.
  3. Use the reference with the appropiate syntax.

Scheduling configurations

Once you have your configuration created and tested you may want to run this configuration on specific times, maybe for devops processes, monitoring tasks or automating your every-day job ;).

By adding the schedule key to your configuration, at the same level of pipeline you can add a cronspec and lives values to control when your configuration runs and how many times.

YAML
version: "2"
name: schedule_example
schedule:
  cronspec: "5 4 * * 2" # (1)
  lives: 10 # (2)
defaults:
  user: test
  site: local

pipeline:
  steps:
    - name: daily-monitoring-task
      stage: 1
      work: # ? Work object payload
        function: guidelines.example.alpha
        parameters:
          mu0: ${{ matrix.mu0 }}
          alpha: ${{ matrix.alpha }}
          sigma0: 22.0
  1. This cronspec will trigerr the configuration every Tuesday at 04:05AM
  2. The trigger will only happen 10 times, after that the scheduled will enter the status "expired". For indefinite cronspec use -1.