Skip to content

Deletion

Summary

This document outlines a proposal for automatically deleting files registered in datatrail when it is time to delete them according to the deletion policy of that file.

Motivation

Goals

The main purpose of Datatrail is to manage CHIME/FRB's raw data. One of the primary features of Datatrail is to automatically delete files from storage elements (set by the deletion policy). This is to ensure that we are automatically cleaning up our storage regularly. This system must be very careful before deleting the last existing copy of a file.

Issues can happen if the file was supposed to be replicated to another storage element but that did not happen before the expiry date of the file at the current storage element.

eg. Say that we have a file at CHIME and its replication policy mandates that there must be an additional copy at Minoc. In addition, the deletion policy for the file is to delete the data from CHIME after 10 days but never delete it from Minoc. In this case, the replication daemon would have 10 days to catch up and make a copy of this file to Minoc but for some reason (could be a transfer glitch, anything really) that did not happen. 10 days later, the deletion daemon identifies that the replica at CHIME has reached its expiry and can be deleted. Because the additional copy at Minoc does not yet exist, the deletion daemon must not stage this file for deletion.

Non-Goals

Proposal

User Stories

User Story 1

After the callbacks happen, datatrail will register the events after two days. At that point, the deletion policy for all the files belonging to an event will be set. We don't want users to conduct manual cleanup since its a lot of data and scary to delete things. The automated deletion system should be able to read the deletion policy of the file replicas at all storage elements and automatically delete it after a specific date as informed by the policy.

System Components

This system will contain 4 components:

  1. Deletion Staging Daemon
  2. Buckets (for queuing work for deleters)
  3. Deleters
  4. State Updater Daemon (to update the state of the database after deletion of a file replica)

Deletion Staging Daemon

Deletion Staging Daemon will run at CHIME main server site to create work for the deleters. It will directly perform read queries from the database to identify which files are to be deleted. It will alsi contian the logic to ensure that the system is not blindly deleting the only copy of the file. It is the starting point of the automated deletion system. It will:

  1. Check how many files are pending in the Deleter's buckets so that the queue is not overflowing. We will limit it such that it will not create more work if there are already N files in the bucket that are yet to be deleted.
  2. Perform the database query to identify the next M files to delete.
  3. Create a work object per file for the deleters and deposit it in their bucket.

Deleters

Deleters will be responsible for performing the actual deletion of the file. You can have X deleters running at each site to perform X parallel file deletions since each deleter handles only file at a time. Each deleter will:

  1. Withdraw work from its bucket.
  2. Perform the deletion using the appropriate protocol for the storage element that it is cleaning.
  3. Create a work object for the State Updater Daemon containing the deletion status of the current file.

State Updater

The state updater daemon will be responsible for updating the state of the deleted file in the database. There is no need for multiple state updaters. A single state updater should suffice. The same state updater daemon will also handle the logic for updating the replication state when a file is replicated.

It will:

  1. Withdraw work from its bucket.
  2. Update the state of the file in the database.
  3. Conclude work with the appropriate status.

Design Details

Database Query and Logic to identify the files to delete.


payload of the work object for deleters

{
    "file_id": int # database id of the file to delete
    "file_path": str # complete path to the file's location on the storage element.
}

payload of the work object for the state updater daemon

{
    ""
}