Replication¶
The replication system comprises a stager and worker, which are part of datatrail and datatrail-client respectively. The stager parses the datatrail server for file replicas that match the following conditions:
Python
session.query(FileReplica)
.filter(
FileReplica.replication_state == TransformationState.available,
FileReplica.deletion_state == TransformationState.available,
FileReplica.storage_id == storage_id,
FileReplica.replicate_to == replicate_to,
)
.order_by(FileReplica.id)
.limit(MAX_QUEUED)
.all()
For each file replica, a work is created in created in a bucket named
datatrail-replicator-{storage_id}-{replicate_to}
. Note, there are plans to change this
to one work per dataset listing the files in that dataset that need to be replicated.
While the worker, or replicator, continuously checks the bucket for work to perform and
copies the data between the storage elements.