Definitions¶

Database models¶

All database models are defined here.

File¶

DB Table containing name and specs (md5sum, etc.).

Storage Element¶

DB Table containing name, address of the storage location, transfer protocol it prefers and information about disk usage. * Examples of storage element: * CHIME, CANFAR, CEDAR, BELUGA

File Replica¶

DB Table that maps physical copies of files on a storage element. For each file there will be a replica on the storage elements where the file physically is.It contains information about file path on the storage element, replication, deletion and aggregation state, priority and protocols.

Dataset¶

DB Table. A collection of files or other datasets. It is used to encapsulate files and datasets for different purposes. Datasets are also where we apply replication, deletion and aggregation policies.

Hierarchy¶

Files can belong to a dataset.
Datasets can belong to other datasets.
Files → dataset X→ dataset A → dataset Z …
This allows you to have things like:
Intensity files belong to an event X.
Event X belongs to ClassifiedFRBs, Catalog1 and Catalog2.

Replication Policy¶

Policy for replication of files belonging to a dataset. This policy is added to a dataset when it is created.
Num replicas, preferred_storage_elemenets, priority.
When new files are attached to a dataset, the policy is translated to the file and file replica tables as appropriate properties.
When a dataset is attached to another dataset with a different set of policies, we take the least destructive action on the file replicas.
Replicate sooner of the two policies.
Default Policy only for Core datasets (for derived datasets, an explicit policy needs to be provided):
Replication: num replicas 1, replication_priority low, preferred_storage_site CHIME.

Deletion Policy¶

Policy for deletion of file replicas belonging to a dataset on storage elements. This policy is added to a dataset when it is created. For each storage element we require:
delete_after_days, priority.
- Eg. {“CHIME”: {“delete_after_days”: 10, “priority”: “HIGH”}, “CANFAR”: {“delete_after_days”: None, “priority”: low}}
When new files are attached to a dataset, the policy is translated to the file and file replica tables as appropriate properties.
When a dataset is attached to another dataset with a different set of policies, we take the least destructive action on the file replicas.
Delete on a later date of the two policies.
Default Policy only core dataset (for derived datasets, an explicit policy needs to be provided):
Deletion: {“Field Site”: {“delete_after_days”: 90, “priority”: “LOW”}}.

Transformation¶

Replication, Deletion, or Aggregation: basically a core unit of work that is done by DataTrail.

Datatrail Server¶

This is the server that talks to DataTrail’s Postgres database. It has the following main components: * REST Server: where clients can register files, policies etc. and where status about transformations are updated. This is horizontally scalable. * Policy Evaluator: (defined below) * Stager daemons (defined below) * Federator daemons (defined below)

This is the brains of the data management system. All the logic is decided here on which transformation to take on which storage element. This is centralized at CHIME.