Skip to content

Registration

Registration Worker

The registration system can be run in two modes, the first registers only events on a given date, and the second registers all dates from the last registered day till two days before the give date.

Nominally, Datatrail's registration system is designed to run continuously in the latter mode. The delay of two days behind today's date was a design choice, the delay is due to human classification of some events in order to allow the FRB tsars time to classify them.

Registration CLI:

datatrail-admin registration --help (docs-update)datatrail-adminUsage: datatrail-admin registration [OPTIONS] COMMAND [ARGS]...

Registration tools.

Options:
--help Show this message and exit.

Commands:
register-datasets Register datasets into datatrail.
register-events Register events from the frb archivers.
register-failed-datasets Retry registering failed datasets into...
show-last-event-date Find date that the events have been...
show-unregistered-datasets Find all unregistered events.
show-unregistered-events Find all unregistered events.
  1. register-datasets Buckets-based registration system which registers payloads found in the "datatrail-register-dataset" bucket.
  2. register-events date-based registration system which registers datasets by traversing the directories /data/chime/baseband/raw/ and /data/chime/intensity/raw either from the "last completed date" till two days before the present date, or for a specific date if specified.
  3. register-failed-datasets attempts to register datasets that failed to be registered and were recorded in the "datatrail-unregistered-datasets" bucket.
  4. show-last-event-date shows the date until which events from the archiver have been registered.
  5. show-unregistered-events shows all the events that failed to be registered. It dumps the output to a json file unregistered_events.json which contains the schema shown below.
  6. show-unregistered-datasets same as previous command but for datasets. Schema varies slightly.
JSON
[
  {
    "event_number": 247289509,
    "data_type": "intensity", #(or baseband)
    "date_captured": "2022-10-21",
    "action_picker_dataset": "realtime-pipeline",
    "reason": "Verification for event 247289509 not found in FRB Master! Check if this was a duplicate event of another."
  },
  ...
]

Troubleshooting

The table below shows the commonly occuring reasons and how to diagnose that situation.

Reason Troubleshooting Steps
Verification for event X not found in FRB Master! Check if this was a duplicate event of another. This event was not found in FRB master meaning it was not sent to the tsars for verification even though it should have. This could happen if
  1. there was a connection (network glitch or FRB Master being down) issue between L4 and FRB Master.
    1. In this case, the event needs a tsar classification and needs to be sent for verification.
  2. It was a misgrouped event where a neighbouring event was inserted in FRB Master instead because of higher SNR.
    1. Check if this event +- 100 is in FRB Master instead. If other events are found, compare the DM, timestamp_utc, and Beam_nos to see if it might have been misgrouped. More steps to follow if this is a misgrouped event and how to fix it.
No actions were found for event X in the Action Picker: L4 database. This means that this event was not found in the ActionGetIntensity or ActionGetBaseband table of L4 database. Most likely, it means that other tables were also not populated for some reason. This will likely happen in case of a severe RFI storm that creates tonnes of false positives leading to wrong callbacks. In turn, this also means that you are likely to see many events around the same time that show this error.
In this case, we should register this data in a dataset called discard so that this event can be deleted. See Create a record in L4

Create a record in L4

This section shows you how to create a new record in the ActionGetIntensity/ActionGetBaseband table of L4 database. You will likely do this to deal with the events that failed registration because No actions were found for event X in the Action Picker.

Here, you can

Bash
ssh precommissioning@frb-L4
ipython
Python
In [1]: import django

In [2]: django.setup()

In [3]: from l4_databases.apps.astro_events.models import ActionGetIntensity, ActionGetBaseband, EventRegister

## Intensity record

In [25]: er = EventRegister.objects.filter(event_no__exact = 245795926)[0]

In [26]: payload = {"action_intensity": "GET_BLOCK", "intensity_width_factor": 50, "intensity_spectral_resolu
    ...: tion": 16384, "intensity_priority_level": "LOW", "event_no": er, "requested_by": "realtime-pipeline"
    ...: , "dataset": "discard"}

In [28]: a = ActionGetIntensity(**payload)

In [29]: a.save()

## Baseband record

In [33]: er = EventRegister.objects.filter(event_no__exact = 245772016)[0]

In [35]: payload = {"baseband_priority_level": "LOW", "event_no": er, "requested_by": "realtime-pipeline", "d
    ...: ataset": "discard"}

In [36]: b = ActionGetBaseband(**payload)

In [37]: b.save()

Update dataset name in ActionGetIntensity and ActionGetBaseband in L4

Here, you can

Text Only
ssh precommissioning@frb-L4
ipython

In ipython shell:

Python
In [1]: import django

In [2]: django.setup()

In [3]: from l4_databases.apps.astro_events.models import ActionGetIntensity, ActionGetBaseband

In [4]: agi = ActionGetIntensity.objects.filter(dataset__exact="dataset-name")

In [5]: agb = ActionGetBaseband.objects.filter(dataset__exact="dataset-name")

In [6]: for a in agi:
   ...:     if a.event_no.event_no == event_no:
   ...:         a.dataset = "new dataset name"
   ...:         a.save()
   ...:         break
   ...:

In [7]: for a in agb:
   ...:     if a.event_no.event_no == event_no:
   ...:         a.dataset = "new dataset name"
   ...:         a.save()
   ...:         break
   ...:

Look for an event in FRB Master's Verification collection

This is how you query an event in FRB master's verification collection. If an event does not exist, it will return None. You can use this to validate neighbouring events by simply looping the event number.

Python
In [8]: import chime_frb_api

In [9]: master = chime_frb_api.frb_master.FRBMaster()

In [10]: verification = master.API.get("/v1/verification/get-verification/9386707")

In [11]: print(verification)
{'data_status': {'baseband_data': False, 'intensity_data': True}, 'datetime': '2018-07-25 17:59:43.000000 UTC+0000', 'delta_rating': '0', 'final_delete': False, 'id': 9386707, 'max_rating': '5', 'quality_factors': {'beam_activity': 29, 'intensityML_grade': {'event_score': 10}, 'l1_grade': [10, 10], 'l2_grade': 9.268584999}, 'user_verification': [{'classification': 'UNCLASSIFIED', 'comments': 'Automatic Pipeline Verification', 'delete': False, 'id': 'l4_pipeline', 'rating': 1}, {'classification': 'NEW CANDIDATE', 'comments': 'Guaranteed FRB candidate', 'delete': False, 'id': 'Chitrang Patel', 'rating': 5}, {'classification': 'NEW CANDIDATE', 'comments': 'guaranteed frb candidate', 'delete': False, 'id': 'Marcus Merryfield', 'known_source': '', 'rating': 5}, {'classification': 'NEW CANDIDATE', 'comments': 'guaranteed frb candidate', 'delete': False, 'id': 'shiny', 'rating': 5}]}

Registering datasets

When registering datasets, there are two broad categories of data:

  1. Event based (eg. full array baseband, intensity, tracking beam acquisition of pulsars as calibrators for events)

    a. Here, we need a tsar classification for the associated event before registering the event so that appropriate policies can be applied depending on its classification. To make this a bit more challenging, tsar classifications typically take two tsar-shifts since we need two classifications which could be anywhere from 4 - 24 hours.

  2. Non-event based (eg. N-squared, data captured for other WG studies)

    b. Here, we do not need a tsar classification for the associated event. The expectation is that the policies have been discussed with the DAWG and the large dataset has been created where smaller datasets will be grouped.

For complete list of dataset categories for CHIME/outriggers see this doc.

Given the complexity, we have designed a self-healing registration in two components:

  1. Initial registration: datatrail register-datasets

    a. The non-event based datasets will likely have no problems being successfully registered in the first attempt. The common failure mode will be that the parent dataset does not exist.

    b. We will run this service (multiple replicas if needed) at each site. The expectation is that this service needs to be able to access the data path where the files are located so that it can compute the md5sums and file sizes etc.

    c. This service periodically looks for work in its bucket for the site its running on. The expectation is that some other system is responsible for submitting work into this bucket so that it can be registered.

    d. The event based datasets will always fail if the initial registration is attempted before two classifications are in because the system wont know where to put it. At the same time, we dont want users/other systems to remember to do this periodically. Instead, we shift this problem to a single service Retry failed registration.

  2. Retrying failed registration: datatrail register-failed-datasets

    a. This periodically attempts to re-register failed datasets. It fetches information from a database that contains all the necessary payload to refister the files of an event with the exception of the larger dataset where the event needs to be registered. This way, if the underlying problem is solved (for eg. the associated now has two classifications), this system will try to re-register it and finally succeed.

    b. We only need to run one servce. Best to do it on the CHIME frb-analysis cluster.

    c. Admins are expected to keep an eye for common modes of failure and develop solutions to quickly resolve those via datatrail CLI/API.