How to capture and registered new events¶
Acquiring Event Data¶
Currently, if users want to acquire intensity/baseband
data for events meeting certain conditions, you request those action rules in the L4 pipeline. As a part of that process:
- Users need to provide a dataset name so that events acquired using that criteria can be grouped into that dataset.
- Users also need to provide replication and deletion policies for this dataset.
- The action implementer should coordinate with the data manager to ensure that the dataset is also created in Datatrail before implementing the actions (TODO: automate this process).
Registering Data¶
Creating a storage element¶
A storage element is a space where data are saved for a short/long period of time. The feature set that handles storage elements is very minimal. Regardless, you are encouraged to fill in the information as accurately as possible.
from datatrail_admin import commit
SE = {
"name": "chime",
"storage_type": "field", # choices are field, transport, hot, cold, archive
"protocol": "posix", # choices are posix, cadcclient, vosapi
"root": "/",
"address": "tubular.chimenet.ca",
"active": True,
"total_gb": 155000,
"available_gb": 34000,
"min_available_gb": 10000,
"notes": "This is CHIME",
}
se = commit.storage_element_create(**SE)
Create a larger dataset¶
In order to register a datasets that will be captured from new action rules that have been added to the L4 pipeline, you need to create a larger dataset which corresponds to the label in the action rules. If the larger dataset already exists, this step can be skipped. Look at this page for info on how to search for a dataset.
Dataset and Scope
Note that it is the combination for dataset name and scope that make it
unique. Therefore, if the new action rule is for intensity data, so
chime.event.intensity.raw
scope, but a larger dataset was already created
for baseband data, so chime.event.baseband.raw
scope. Then, a new dataset
for the corresponding scope must be created.
from datatrail_admin import commit
DS = {
"name": "classified.FRB",
"scope": "chime.event.intensity.raw",
"replication_policy": {
"preferred_storage_elements": ["chime", "minoc"],
"priority": "high",
},
"deletion_policy": [
{
"storage_element": "minoc",
"delete_after_days": 10000000000,
"priority": "medium",
},
{"storage_element": "chime", "delete_after_days": 10, "priority": "medium"},
],
"belongs_to": [],
}
ds = commit.dataset_create(**DS)
Register Files¶
This step is for information only, during nominal operation the registration service is where files and file replicas are registered.
import datetime
import pytz
from datatrail_admin import commit
files = [{
"name": "astro_9386707_2018072512300203.msgpack",
"path": "/data/chime/intensity/raw/2018/07/25/astro_9386707/astro_9386707_2018072512300203.msgpack",
"md5sum": "1ksdkj42kskdjk2k4ksf",
"date_created": pytz.utc.localize(datetime.datetime.utcnow()).isoformat()
}]
status = commit.file_register(
dataset_name="9386707",
dataset_scope="chime.event.intensity.raw",
storage_name="chime",
files=files,
attach_to_dataset="classified.FRB",
)
Registering datasets¶
When registering datasets, there are two broad categories of data:
-
Event based
(eg. full array baseband, intensity, tracking beam acquisition of pulsars as calibrators for events)a. Here, we need a tsar classification for the associated event before registering the event so that appropriate policies can be applied depending on its classification. To make this a bit more challenging, tsar classifications typically take two tsar-shifts since we need two classifications which could be anywhere from 4 - 24 hours.
-
Non-event based
(eg. N-squared, data captured for other WG studies)a. Here, we do not need a tsar classification for the associated event. The expectation is that the policies have been discussed with the DAWG and the large dataset has been created where smaller datasets will be grouped.
For complete list of dataset categories for CHIME/outriggers see this doc.
In order to register a dataset, you simply need to submit work to a bucket. Here is the payload:
from chime_frb_api.workflow import Work
site = "kko" # or chime, gbo, hco etc.
payload = {
# required
"name": "dataset-name",
# required
"scope": "dataset-scope",
# required
"storage_element": "storage-element-name", # eg. chime, minoc, arc, kko
#required
"storage_element_captured_at": "storage-element-name", # while the same as "storage_element" at registration after replication, this will NOT change while "storage element" will change depending on where the replica is copied to.
# required
"data_path": "/data/kko/baseband/raw/2023/01/01/astro_987543784/",
# optional: only populate this field for datasets that do not require tsar classification.
"attach_to_dataset": {
"name": "larger-dataset", # e.g. outrigger.commissioning.B0329+54
"scope": "dataset-scope"
},
# optional: only populate this field for datasets that need tsar classification
"associated_event": {
"event": 9386707,
"data_type": "baseband"
},
}
w = Work('datatrail-register-dataset', site=site, paramaters=payload)
w.deposit()