Setting up Datatrail at a new site¶

Prerequisites¶

Network File System (NFS) for access to following paths on all nodes:

Text Only

- `/data/{site}/cert`
- `/data/{site}/baseband/raw`
- `/data/{site}/any/other/data/directories`

This is configured in /etc/exports and /etc/fstab.

Heavily Suggested: Directory Structure

Directory structure should conform to the following structure:

Data Product	Location
Baseband Raw Data	/data/{SITE}/baseband/raw/YYYY/MM/DD/{EVENT_NUMBER}/*
Baseband Processed Data	/data/{SITE}/baseband/processed/YYYY/MM/DD/{EVENT_NUMBER}/*
Intensity Raw Data	/data/{SITE}/intensity/raw/YYYY/MM/DD/{EVENT_NUMBER}/*
Intensity Processed Data	/data/{SITE}/intensity/processed/YYYY/MM/DD/{EVENT_NUMBER}/*
Calibration Raw Data	/data/{SITE}/calibration/raw/processed/YYYY/MM/DD/{SOURCE_NAME}/*
Calibration Processed Data	/data/{SITE}/calibration/raw/processed/YYYY/MM/DD/{SOURCE_NAME}/*
Acquisition Raw Data	/data/{SITE}/acquisition/raw/{ACQUISITION_NAME}/*
Acquisition Processed Data	/data/{SITE}/acquisition/processed/{ACQUISITION_NAME}/*
Services Data	/data/{SITE}/services/{SERVICE_NAME}/*

Steps¶

1. Create a new storage element for new site¶

Naming convention

Names cannot contain an underscore due to conflicts with workflow

For each new storage element that you want to add to Datatrail, a storage_element must be created for it. Below are examples of defined storage elements that are defined in Datatrail and the command to create one. For more information, see here.

Python

[
  {
    "name":"chime",
    "storage_type":"field",
    "protocol":"posix",
    "root":"",
    "active":true,
    "address":"tubular.chimenet.ca",
    "total_gb":512000.0,
    "available_gb":65536.0,
    "min_available_gb":25600.0,
    "available_gb_last_checked":"2021-12-08T17:31:19.347388+00:00"
  },
  ...,
]

Python

commit.storage_element_create?
Signature:
commit.storage_element_create(
    name,
    storage_type,
    protocol,
    root,
    address,
    active,
    total_gb,
    available_gb,
    min_available_gb,
    notes,
    base_url='https://frb.chimenet.ca/datatrail',
    test_mode=False,
)
Docstring: Create a new storage element.
File:      ~/Library/Caches/pypoetry/virtualenvs/datatrail-sZAH7HjW-py3.9/lib/python3.9/site-package
s/datatrail/commit.py
Type:      function

2. Create required new larger dataset with policies¶

During the setup at a new site, many new larger datasets will need to be created for all of the data types that will be captured. In this document, the example scope will be limited to baseband data. The steps will be the same for any other data type. Note, however, that registration service presently packaged with Datatrail Admin only searches for baseband data.

Planning of larger datasets and data types

Below is some of the planning that went into the detemination of policies during the discussion of data at outrigger sites. The following was decided in the context of KKO.

Text Only

Raw Event  (full array baseband data)
Scope: {outrigger}.event.baseband.raw
Large Data Sets
classified.FRB     replicate to MINOC forever, site for 3 months
classified.Pulsar  site for 1 month
classified.RFI       site for 1 month
classified.Noise   site for 1 month
{repeatername}.commissioning.FRB   replicate to MINOC forever, site for 3 months
{pulsarname}.commissioning.Pulsar   site for 3 months (must have been beamformed)
{repeatername}.{WG}.science.FRB   WG decides
{pulsarname}.{WG}.science.Pulsar    WG decides
Discard  site for 2 weeks
Path: /data/{outrigger}/baseband/raw/YYYY/MM/DD/astro_##########/*

Processed Event (beamformed data)
Scope: {outrigger}.event.baseband.beamformed
Large Data Sets
classified.FRB    replicate to MINOC high priority and forever, site for 3 months
classified.Pulsar  replicate to MINOC forever, site for 3 months
classified.RFI       site for 1 month
classified.Noise   site for 1 month
{repeatername}.commissioning.FRB  replicate to MINOC forever, site for 3 months
{pulsarname}.commissioning.Pulsar   replicate to MINOC forever, site for 3 months
{repeatername}.{WG}.science.FRB  WG decides
{pulsarname}.{WG}.science.Pulsar   WG decides
Discard   site for 2 weeks
Path: /data/{outrigger}/baseband/processed/YYYY/MM/DD/astro_##########/*

...

Python

from datatrail_admin import commit

DS = {
    "name": "classified.FRB",
    "scope": "{site}.event.intensity.raw",
    "replication_policy": {
        "preferred_storage_elements": ["{site}", "minoc"],
        "priority": "high",
    },
    "deletion_policy": [
        {
            "storage_element": "minoc",
            "delete_after_days": 10000000000,
            "priority": "medium",
        },
        {"storage_element": "{site}", "delete_after_days": 10, "priority": "medium"},
    ],
    "belongs_to": [],
}

ds = commit.dataset_create(**DS)

Warning

I cannot stress how important policies are; these are what will determine whether your data is replicated and deleted! Take care when defining these.

3. Launch Datatrail services¶

In order to launch the services a new Docker Compose file will need to be created. This file can be based off stacks/kko-client.yaml, shown below. All references to "kko" should be replaced with the name of the site. Additionlly, the volumes should all be redefined for the site.

YAML

{!../stacks/kko-client.yaml!}