Workflow Tutorial¶
Installation¶
Step 1: Clone the Repository¶
First, clone the workflow
repository:
Step 2: Install dependencies with Poetry¶
Install all depedencies needed to run workflow
using the following command:
This will allow you to call workflow
from the command line.
Step 3: Check your workspace¶
All services on the workflow ecosystem need a workspace to function correctly. A workspace is a YAML file that defines several configurations like base URLs to use, deployers, Work.config
values, archive locations, sites, etc.
This tutorial will be using the development.yml
workspace located on the workflow/workspaces/
folder of the workflow
repository:
workspace: development
# List the valid sites for this workspace
sites:
- local
http:
baseurls:
configs: http://localhost:8001/v2
pipelines: http://localhost:8001/v2
schedules: http://localhost:8001/v2
buckets: http://localhost:8004
results: http://localhost:8005
archive:
mounts:
local: "/"
config:
archive:
plots:
methods:
- "bypass"
- "copy"
- "delete"
- "move"
storage: "s3"
products:
methods:
- "bypass"
- "copy"
- "delete"
- "move"
storage: "posix"
results: true
permissions:
posix:
user: "user"
group: "chime-frb-ro"
command: "setfacl -R -m g:{group}:r {path}"
deployers:
local:
docker:
client_url: unix:///var/run/docker.sock
networks:
workflow: # This network maps to the docker-compose.yml network name
You would need to set the development
workspace so workflow
knows where the service's backends are located:
Locating workspace development
Reading ./workflow/workspaces/development.yml
Workspace development set to active.
Step 3: Launch the Required Services¶
To launch all services required for this tutorial, you need to launch the docker-compose.yml
:
This will launch the following services:
Buckets
: Queue of Work objects.Results
: Work results storage.mongo
: MongoDB used by all services.Pipelines API
: Pipelines server that workflow will talk to.Pipelines Managers
: Managers server in charge of monitoring tasks.
Creating Your First Config¶
Now, let’s create your first config:
From the workflow
folder, create a file named hello-world-config.yaml
and add the following content:
version: "2"
name: hello-world-test
pipeline:
steps:
- name: hello-world
stage: 1
work:
command: ["echo", "hello", "world"]
user: test
site: local
In this example, we have only one step that contains a Work object to be executed in the first stage. It runs a single bash command echo "hello world"
. This is the most basic configuration you can create.
The main component of a configuration is the pipeline
. A pipeline
consists on a list of steps for which we can define the order of execution using the stage
key.
Deposit a Configuration¶
To send your configuration to workflow, you have to use the following command:
Workflow Configs
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Deploy Result ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ config: 667c2806a0e2440586c3cfdb │
│ pipelines: │
│ 667c2806a0e2440586c3cfda │
│ │
└────────────────────────────────────────────────┘
This command will return the IDs for the Config object deposited and for the Pipeline objects created based on the configuration.
The Work
object created by this Config
will stay on Buckets
until some worker takes it and executes it.
We can act as such worker from our machine, executing the work object locally with the following command:
This command will search for a Work
object named hello-world
from site local
and execute it.
Check the status of your pipelines¶
We can monitor the actual status of our pipelines from the workflow
cli. For this we only need the Config
ID:
Workflow Configs
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Config: hello-world-test ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ id: 667c2806a0e2440586c3cfdb │
│ version: 1 │
│ pipelines: │
│ 667c2806a0e2440586c3cfda: ✅ │
│ deployments: None │
│ user: test │
│ │
├─────────────────────────────────────────────────────────────┤
│ Explore pipelines in detail: │
│ workflow pipelines ps <config_name> <pipeline_id> │
└─────────────────────────────────────────────────────────────┘
We can see that our pipeline has a green check that means it was completed. To check the details of the pipelines we can follow the example at the bottom of the config ps
output.
All pipelines generated by a Config
will be located under that Config.name
, for example, out config is named hello-world-test
so all of the pipelines generated by this config (in this case just one) need to be queried using that name:
Workflow Pipelines
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Pipeline: hello-world-test ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ id: 667c2806a0e2440586c3cfda │
│ current_stage: 1 │
│ status: success │
│ creation: 2024-06-26 10:39:02.276542 │
│ start: 2024-06-26 10:39:02.524662 │
│ stop: 2024-06-26 10:40:02.855216 │
│ steps: │
│ hello-world:✅ │
│ │
└────────────────────────────────────────────────┘
Using defaults¶
defaults
is a QoL feature that allow us to define a set of parameters for all of the Work
objects defined on our pipeline in a single place.
For example, let's say that our pipeline contains 100 steps. Now, the Work
object needs to know the user
and the site
but it can become tedious to write those values every time we add Work
to our pipeline, for this we can use defaults
:
version: "2"
name: files-test
defaults:
user: "test"
site: "local" #(1)
tags: ["tag1"]
pipeline:
steps:
- name: create-file
stage: 1
work:
command: ["touch", "test.py", ";", "echo", "print('hello')", ">", "test.py"]
- name: run-file
stage: 2
work:
command: ["python test.py"]
- name: finish
stage: 3
work:
command: ["echo Finished"]
- The values defined on
defaults
will be applied to all of theWork
objects in the pipeline.
Extending the Work payload¶
As we defined previously, every step contains a work payload and this payload accepts all of the parameters that the Work
object accepts.
version: "2"
name: tags-test
defaults:
user: "test"
site: "local" #(1)
tags: ["tag1"]
pipeline:
steps:
- name: sum
stage: 1
work:
function: workflow.examples.function.math #(1)
event: [594688]
parameters:
alpha: 1
beta: 5
notify:
slack:
...
config:
archive:
results: true
products: "copy"
plots: "move"
priority: 4
- This is an example function, provided on the workflow repo.
Adding deployments¶
Until now, we have been running the command poetry run workflow run <buckets>
to execute the Work
objects that we define on our configurations but workflow
allows you to automate even this by defining deployments
on your configuration file:
version: "2"
name: example_deployments
defaults:
user: test
site: local
deployments:
- name: ld1 # (1)
site: local
image: chimefrb/workflow:latest
sleep: 1
resources:
cores: 2
ram: 1G
replicas: 2
- name: ld2
site: local
image: chimefrb/workflow:latest
sleep: 1
resources:
cores: 1
ram: 512M
replicas: 1
pipeline:
runs_on: ld1 # (2)
steps:
- name: echo
stage: 1
runs_on: ld2
work:
command: ["ls", "-lah"]
- name: uname
runs_on: ld2 # (3)
stage: 2
work:
command: ["uname", "-a"]
- name: printenv
stage: 3
work:
command: ["printenv"]
- Deployment definition. This has to be used on our
pipeline
or on astep
. - In this example we add the deployment to the
pipeline
using the keyruns_on
. This means all pipeline's steps will run on this container. - We can add a deployment to a step, also using the
runs_on
, this will make the step run only on the specified deployment and ignore the deployment inherited from the pipeline.
A configuration can have one or more deployments and these deployments have to be used either on the pipeline
or in a step
. A deployment is just a mapping that will instruct pipelines
backend to create a docker container and run the specified object in it.
Pipeline and step replication¶
When creating a configuration we can define a matrix strategy on our pipeline
or on our steps, this matrix
will replicate the object depending on the values that we define inside of it.
A matrix
is a mapping were each key can have a list of values, pipelines
will determine all of the possible combinations for the values to replicate one object:
pipeline:
matrix:
alpha: [1, 2, 3]
steps:
- name: replicated_step
stage: 1
function: workflow.examples.function.math
parameters:
alpha: 5
beta: ${{matrix.alpha}} # (1)
- In this example we define a matrix for the pipeline (top level), this matrix will make the pipeline to replicate 3 times (once for each value defined on the alpha key) and will replace the values where we use the replace syntax (${{matrix.
}})
We can also apply a matrix strategy to each of our steps:
version: 1
name: example_matrix_steps
defaults:
user: test
site: local
pipeline:
steps:
- name: check-pids
stage: 1
matrix:
pid: [256, 554, 2849]
work:
command: ["ps", "-p", "${{matrix.pid}}"]
Running steps with conditions¶
When creating a Config
we may want to execute some steps depending on conditions.
pipelines
backend provides simple but useful conditions that can enhance your workflow. To add a condition to a specific step you need to use the if
key.
success
: Add this condition to a step so it runs only if the pipeline is being successful.failure
: This condition will make the step to run only if the pipeline fails.always
: Use this condition to run the step regardless of the pipeline state.
version: "2"
name: example
defaults:
user: test
site: local
pipeline:
steps:
- name: alpha
stage: 1
matrix:
alpha:
range:
work:
function: function.that.will.fail
success_threshold: 0.7
- name: Cleanup failed task
if: failure() # (1)
stage: 2
work:
command: ["echo", "cleaning"]
- Will run since the step
alpha
will fail.
Adding success threshold¶
When using a matrix
on a step, we may want to add a success threshold to define what's the amount of successful steps that we need to proceed to the next stage. We can achieve this adding the success_threshold
to our step with matrix.
version: "2"
name: example_success_threshold
defaults:
user: test
site: local
pipeline:
steps:
- name: alpha
matrix: # (1)
ranged_value:
range: [100, 110]
message: ["message1", "message2"]
stage: 1
work:
command: ["echo", "${{ matrix.message }}", "${{ matrix.ranged_value }}"]
success_threshold: 0.7 # (2)
- name: beta
stage: 2
work:
command: ["echo", "succeed"]
- Matrix definition that will replicate the step
alpha
30 times (10 values fromranged_value
times 3 values frommessage
) - We have defined a success threshold of 0.7, meaning that only 70% of the steps have to succeed in order to pass to the next stage.
Referencing results from previous step¶
When creating more complex configurations, we may want to pass results from previous steps to the next, you can use the syntax: ${{ pipeline.<step-name>.results.<field>}}
and save this in the step.reference
:
version: "2"
name: example_reference
defaults:
user: test
site: local
pipeline:
steps:
- name: use-function
stage: 1
matrix:
a: [1.2, 2.5]
b: [5]
event:
- [11122, 11121]
work:
function: workflow.examples.function.math
event: ${{matrix.event}}
parameters:
alpha: ${{matrix.a}}
beta: ${{matrix.b}}
- name: use-results
stage: 2
reference: # (1)
sum: ${{pipeline.use-function.work.results.sum}} # (2)
product: ${{pipeline.use-function.work.results.product}}
matrix:
sum: ${{reference.sum}}
product: ${{reference.product}}
work:
command: ["echo ${{matrix.sum}} && echo ${{matrix.product}}"] # (3)
- We define reference when we want to use values from previous steps.
- Take the
sum
result from stepuse-function
. - Use the reference with the appropiate syntax.
Scheduling configurations¶
Once you have your configuration created and tested you may want to run this configuration on specific times, maybe for devops processes, monitoring tasks or automating your every-day job ;).
By adding the schedule
key to your configuration, at the same level of pipeline
you can add a cronspec
and lives
values to control when your configuration runs and how many times.
version: "2"
name: schedule_example
schedule:
cronspec: "5 4 * * 2" # (1)
lives: 10 # (2)
defaults:
user: test
site: local
pipeline:
steps:
- name: daily-monitoring-task
stage: 1
work: # ? Work object payload
function: guidelines.example.alpha
parameters:
mu0: ${{ matrix.mu0 }}
alpha: ${{ matrix.alpha }}
sigma0: 22.0
- This cronspec will trigerr the configuration every Tuesday at 04:05AM
- The trigger will only happen 10 times, after that the scheduled will enter the status "expired". For indefinite cronspec use -1.