Replication¶

The replication system comprises a stager and worker, which are part of datatrail and datatrail-client respectively. The stager parses the datatrail server for file replicas that match the following conditions:

Python

session.query(FileReplica)
.filter(
    FileReplica.replication_state == TransformationState.available,
    FileReplica.deletion_state == TransformationState.available,
    FileReplica.storage_id == storage_id,
    FileReplica.replicate_to == replicate_to,
)
.order_by(FileReplica.id)
.limit(MAX_QUEUED)
.all()

For each file replica, a work is created in created in a bucket named datatrail-replicator-{storage_id}-{replicate_to}. Note, there are plans to change this to one work per dataset listing the files in that dataset that need to be replicated. While the worker, or replicator, continuously checks the bucket for work to perform and copies the data between the storage elements.