Database¶
Datatrail uses the PostgresSQL database to store all the registered data. Throughout Datatrail, we use the SQLAlchemy ORM to communicate withthe database and perform database queries.
Schema¶
All database models are defined here.
File¶
DB Table containing name and specs (md5sum, etc.).
Storage Element¶
DB Table containing name, address of the storage location, transfer protocol it prefers and information about disk usage. * Examples of storage element: * CHIME, CANFAR, CEDAR, BELUGA
File Replica¶
DB Table that maps physical copies of files on a storage element. For each file there will be a replica on the storage elements where the file physically is.It contains information about file path on the storage element, replication, deletion and aggregation state, priority and protocols.
Dataset¶
DB Table. A collection of files or other datasets. It is used to encapsulate files and datasets for different purposes. Datasets are also where we apply replication, deletion and aggregation policies.
Hierarchy¶
- Files can belong to a dataset.
- Datasets can belong to other datasets.
- Files → dataset X→ dataset A → dataset Z …
- This allows you to have things like:
- Intensity files belong to an event X.
- Event X belongs to ClassifiedFRBs, Catalog1 and Catalog2.
Replication Policy¶
- Policy for replication of files belonging to a dataset. This policy is added to a dataset when it is created.
- Num replicas, preferred_storage_elemenets, priority.
- When new files are attached to a dataset, the policy is translated to the file and file replica tables as appropriate properties.
- When a dataset is attached to another dataset with a different set of policies, we take the least destructive action on the file replicas.
- Replicate sooner of the two policies.
- Default Policy only for Core datasets (for derived datasets, an explicit policy needs to be provided):
- Replication: num replicas 1, replication_priority low, preferred_storage_site CHIME.
Deletion Policy¶
- Policy for deletion of file replicas belonging to a dataset on storage elements. This policy is added to a dataset when it is created. For each storage element we require:
- delete_after_days, priority.
- Eg. {“CHIME”: {“delete_after_days”: 10, “priority”: “HIGH”}, “CANFAR”: {“delete_after_days”: None, “priority”: low}}
- When new files are attached to a dataset, the policy is translated to the file and file replica tables as appropriate properties.
- When a dataset is attached to another dataset with a different set of policies, we take the least destructive action on the file replicas.
- Delete on a later date of the two policies.
- Default Policy only core dataset (for derived datasets, an explicit policy needs to be provided):
- Deletion: {“Field Site”: {“delete_after_days”: 90, “priority”: “LOW”}}.
Migration¶
To add/update/delete the fields in the database, we use a library called alembic.
Setting up Alembic¶
alembic init migrations (for a new database)
or
alembic stamp head (if the database is already populated)
Creating a migration script¶
alembic revision --autogenerate -m "name of your revision"
Run the migration¶
alembic upgrade head
Backups¶
Database backups are described in https://github.com/CHIMEFRB/frb-devops/tree/main/l4-terminus/postgres .