Setting up Datatrail at a new site¶
Prerequisites¶
Network File System (NFS) for access to following paths on all nodes:
- `/data/{site}/cert`
- `/data/{site}/baseband/raw`
- `/data/{site}/any/other/data/directories`
This is configured in /etc/exports
and /etc/fstab
.
Heavily Suggested: Directory Structure
Directory structure should conform to the following structure:
Data Product | Location |
---|---|
Baseband Raw Data | /data/{SITE}/baseband/raw/YYYY/MM/DD/{EVENT_NUMBER}/* |
Baseband Processed Data | /data/{SITE}/baseband/processed/YYYY/MM/DD/{EVENT_NUMBER}/* |
Intensity Raw Data | /data/{SITE}/intensity/raw/YYYY/MM/DD/{EVENT_NUMBER}/* |
Intensity Processed Data | /data/{SITE}/intensity/processed/YYYY/MM/DD/{EVENT_NUMBER}/* |
Calibration Raw Data | /data/{SITE}/calibration/raw/processed/YYYY/MM/DD/{SOURCE_NAME}/* |
Calibration Processed Data | /data/{SITE}/calibration/raw/processed/YYYY/MM/DD/{SOURCE_NAME}/* |
Acquisition Raw Data | /data/{SITE}/acquisition/raw/{ACQUISITION_NAME}/* |
Acquisition Processed Data | /data/{SITE}/acquisition/processed/{ACQUISITION_NAME}/* |
Services Data | /data/{SITE}/services/{SERVICE_NAME}/* |
Steps¶
1. Create a new storage element for new site¶
Naming convention
Names cannot contain an underscore due to conflicts with workflow
For each new storage element that you want to add to Datatrail, a storage_element
must be created for it. Below are examples of defined storage elements that are defined in
Datatrail and the command to create one. For more information, see
here.
[
{
"name":"chime",
"storage_type":"field",
"protocol":"posix",
"root":"",
"active":true,
"address":"tubular.chimenet.ca",
"total_gb":512000.0,
"available_gb":65536.0,
"min_available_gb":25600.0,
"available_gb_last_checked":"2021-12-08T17:31:19.347388+00:00"
},
...,
]
commit.storage_element_create?
Signature:
commit.storage_element_create(
name,
storage_type,
protocol,
root,
address,
active,
total_gb,
available_gb,
min_available_gb,
notes,
base_url='https://frb.chimenet.ca/datatrail',
test_mode=False,
)
Docstring: Create a new storage element.
File: ~/Library/Caches/pypoetry/virtualenvs/datatrail-sZAH7HjW-py3.9/lib/python3.9/site-package
s/datatrail/commit.py
Type: function
2. Create required new larger dataset with policies¶
During the setup at a new site, many new larger datasets will need to be created for all of the data types that will be captured. In this document, the example scope will be limited to baseband data. The steps will be the same for any other data type. Note, however, that registration service presently packaged with Datatrail Admin only searches for baseband data.
Planning of larger datasets and data types
Below is some of the planning that went into the detemination of policies during the discussion of data at outrigger sites. The following was decided in the context of KKO.
Raw Event (full array baseband data)
Scope: {outrigger}.event.baseband.raw
Large Data Sets
classified.FRB replicate to MINOC forever, site for 3 months
classified.Pulsar site for 1 month
classified.RFI site for 1 month
classified.Noise site for 1 month
{repeatername}.commissioning.FRB replicate to MINOC forever, site for 3 months
{pulsarname}.commissioning.Pulsar site for 3 months (must have been beamformed)
{repeatername}.{WG}.science.FRB WG decides
{pulsarname}.{WG}.science.Pulsar WG decides
Discard site for 2 weeks
Path: /data/{outrigger}/baseband/raw/YYYY/MM/DD/astro_##########/*
Processed Event (beamformed data)
Scope: {outrigger}.event.baseband.beamformed
Large Data Sets
classified.FRB replicate to MINOC high priority and forever, site for 3 months
classified.Pulsar replicate to MINOC forever, site for 3 months
classified.RFI site for 1 month
classified.Noise site for 1 month
{repeatername}.commissioning.FRB replicate to MINOC forever, site for 3 months
{pulsarname}.commissioning.Pulsar replicate to MINOC forever, site for 3 months
{repeatername}.{WG}.science.FRB WG decides
{pulsarname}.{WG}.science.Pulsar WG decides
Discard site for 2 weeks
Path: /data/{outrigger}/baseband/processed/YYYY/MM/DD/astro_##########/*
...
from datatrail_admin import commit
DS = {
"name": "classified.FRB",
"scope": "{site}.event.intensity.raw",
"replication_policy": {
"preferred_storage_elements": ["{site}", "minoc"],
"priority": "high",
},
"deletion_policy": [
{
"storage_element": "minoc",
"delete_after_days": 10000000000,
"priority": "medium",
},
{"storage_element": "{site}", "delete_after_days": 10, "priority": "medium"},
],
"belongs_to": [],
}
ds = commit.dataset_create(**DS)
Warning
I cannot stress how important policies are; these are what will determine whether your data is replicated and deleted! Take care when defining these.
3. Launch Datatrail services¶
In order to launch the services a new Docker Compose file will need to be
created. This file can be based off stacks/kko-client.yaml
, shown below. All
references to "kko" should be replaced with the name of the site. Additionlly,
the volumes should all be redefined for the site.