Skip to content

Database

Datatrail uses the PostgresSQL database to store all the registered data. Throughout Datatrail, we use the SQLAlchemy ORM to communicate withthe database and perform database queries.

Schema

All database models are defined here.

File

DB Table containing name and specs (md5sum, etc.).

Storage Element

DB Table containing name, address of the storage location, transfer protocol it prefers and information about disk usage. * Examples of storage element: * CHIME, CANFAR, CEDAR, BELUGA

File Replica

DB Table that maps physical copies of files on a storage element. For each file there will be a replica on the storage elements where the file physically is.It contains information about file path on the storage element, replication, deletion and aggregation state, priority and protocols.

Dataset

DB Table. A collection of files or other datasets. It is used to encapsulate files and datasets for different purposes. Datasets are also where we apply replication, deletion and aggregation policies.

Hierarchy
  • Files can belong to a dataset.
  • Datasets can belong to other datasets.
  • Files → dataset X→ dataset A → dataset Z …
  • This allows you to have things like:
  • Intensity files belong to an event X.
  • Event X belongs to ClassifiedFRBs, Catalog1 and Catalog2.
Replication Policy
  • Policy for replication of files belonging to a dataset. This policy is added to a dataset when it is created.
  • Num replicas, preferred_storage_elemenets, priority.
  • When new files are attached to a dataset, the policy is translated to the file and file replica tables as appropriate properties.
  • When a dataset is attached to another dataset with a different set of policies, we take the least destructive action on the file replicas.
  • Replicate sooner of the two policies.
  • Default Policy only for Core datasets (for derived datasets, an explicit policy needs to be provided):
  • Replication: num replicas 1, replication_priority low, preferred_storage_site CHIME.
Deletion Policy
  • Policy for deletion of file replicas belonging to a dataset on storage elements. This policy is added to a dataset when it is created. For each storage element we require:
  • delete_after_days, priority.
    • Eg. {“CHIME”: {“delete_after_days”: 10, “priority”: “HIGH”}, “CANFAR”: {“delete_after_days”: None, “priority”: low}}
  • When new files are attached to a dataset, the policy is translated to the file and file replica tables as appropriate properties.
  • When a dataset is attached to another dataset with a different set of policies, we take the least destructive action on the file replicas.
  • Delete on a later date of the two policies.
  • Default Policy only core dataset (for derived datasets, an explicit policy needs to be provided):
  • Deletion: {“Field Site”: {“delete_after_days”: 90, “priority”: “LOW”}}.

Migration

To add/update/delete the fields in the database, we use a library called alembic.

Setting up Alembic

alembic init migrations (for a new database)
or
alembic stamp head (if the database is already populated)

Creating a migration script

alembic revision --autogenerate -m "name of your revision"

Run the migration

alembic upgrade head

Backups

Database backups are described in https://github.com/CHIMEFRB/frb-devops/tree/main/l4-terminus/postgres .