Registration¶
Registration Worker¶
The registration system can be run in two modes, the first registers only events on a given date, and the second registers all dates from the last registered day till two days before the give date.
Nominally, Datatrail's registration system is designed to run continuously in the latter mode. The delay of two days behind today's date was a design choice, the delay is due to human classification of some events in order to allow the FRB tsars time to classify them.
Registration CLI:
Registration tools.
Options:
--help Show this message and exit.
Commands:
register-datasets Register datasets into datatrail.
register-events Register events from the frb archivers.
register-failed-datasets Retry registering failed datasets into...
show-last-event-date Find date that the events have been...
show-unregistered-datasets Find all unregistered events.
show-unregistered-events Find all unregistered events.
register-datasets
Buckets-based registration system which registers payloads found in the "datatrail-register-dataset" bucket.register-events
date-based registration system which registers datasets by traversing the directories/data/chime/baseband/raw/
and/data/chime/intensity/raw
either from the "last completed date" till two days before the present date, or for a specific date if specified.register-failed-datasets
attempts to register datasets that failed to be registered and were recorded in the "datatrail-unregistered-datasets" bucket.show-last-event-date
shows the date until which events from the archiver have been registered.show-unregistered-events
shows all the events that failed to be registered. It dumps the output to a json fileunregistered_events.json
which contains the schema shown below.show-unregistered-datasets
same as previous command but for datasets. Schema varies slightly.
[
{
"event_number": 247289509,
"data_type": "intensity", #(or baseband)
"date_captured": "2022-10-21",
"action_picker_dataset": "realtime-pipeline",
"reason": "Verification for event 247289509 not found in FRB Master! Check if this was a duplicate event of another."
},
...
]
Troubleshooting¶
The table below shows the commonly occuring reasons and how to diagnose that situation.
Reason | Troubleshooting Steps |
---|---|
Verification for event X not found in FRB Master! Check if this was a duplicate event of another. | This event was not found in FRB master meaning it was not sent to the tsars for verification even though it should have. This could happen if
|
No actions were found for event X in the Action Picker: L4 database. | This means that this event was not found in the ActionGetIntensity or ActionGetBaseband table of L4 database. Most likely, it means that other tables were also not populated for some reason. This will likely happen in case of a severe RFI storm that creates tonnes of false positives leading to wrong callbacks. In turn, this also means that you are likely to see many events around the same time that show this error. In this case, we should register this data in a dataset called discard so that this event can be deleted. See Create a record in L4 |
Create a record in L4¶
This section shows you how to create a new record in the ActionGetIntensity/ActionGetBaseband table of L4 database. You will likely do this to deal with the events that failed registration because No actions were found for event X in the Action Picker
.
Here, you can
In [1]: import django
In [2]: django.setup()
In [3]: from l4_databases.apps.astro_events.models import ActionGetIntensity, ActionGetBaseband, EventRegister
## Intensity record
In [25]: er = EventRegister.objects.filter(event_no__exact = 245795926)[0]
In [26]: payload = {"action_intensity": "GET_BLOCK", "intensity_width_factor": 50, "intensity_spectral_resolu
...: tion": 16384, "intensity_priority_level": "LOW", "event_no": er, "requested_by": "realtime-pipeline"
...: , "dataset": "discard"}
In [28]: a = ActionGetIntensity(**payload)
In [29]: a.save()
## Baseband record
In [33]: er = EventRegister.objects.filter(event_no__exact = 245772016)[0]
In [35]: payload = {"baseband_priority_level": "LOW", "event_no": er, "requested_by": "realtime-pipeline", "d
...: ataset": "discard"}
In [36]: b = ActionGetBaseband(**payload)
In [37]: b.save()
Update dataset name in ActionGetIntensity
and ActionGetBaseband
in L4¶
Here, you can
In ipython
shell:
In [1]: import django
In [2]: django.setup()
In [3]: from l4_databases.apps.astro_events.models import ActionGetIntensity, ActionGetBaseband
In [4]: agi = ActionGetIntensity.objects.filter(dataset__exact="dataset-name")
In [5]: agb = ActionGetBaseband.objects.filter(dataset__exact="dataset-name")
In [6]: for a in agi:
...: if a.event_no.event_no == event_no:
...: a.dataset = "new dataset name"
...: a.save()
...: break
...:
In [7]: for a in agb:
...: if a.event_no.event_no == event_no:
...: a.dataset = "new dataset name"
...: a.save()
...: break
...:
Look for an event in FRB Master's Verification collection¶
This is how you query an event in FRB master's verification collection. If an event does not exist, it will return None
.
You can use this to validate neighbouring events by simply looping the event number.
In [8]: import chime_frb_api
In [9]: master = chime_frb_api.frb_master.FRBMaster()
In [10]: verification = master.API.get("/v1/verification/get-verification/9386707")
In [11]: print(verification)
{'data_status': {'baseband_data': False, 'intensity_data': True}, 'datetime': '2018-07-25 17:59:43.000000 UTC+0000', 'delta_rating': '0', 'final_delete': False, 'id': 9386707, 'max_rating': '5', 'quality_factors': {'beam_activity': 29, 'intensityML_grade': {'event_score': 10}, 'l1_grade': [10, 10], 'l2_grade': 9.268584999}, 'user_verification': [{'classification': 'UNCLASSIFIED', 'comments': 'Automatic Pipeline Verification', 'delete': False, 'id': 'l4_pipeline', 'rating': 1}, {'classification': 'NEW CANDIDATE', 'comments': 'Guaranteed FRB candidate', 'delete': False, 'id': 'Chitrang Patel', 'rating': 5}, {'classification': 'NEW CANDIDATE', 'comments': 'guaranteed frb candidate', 'delete': False, 'id': 'Marcus Merryfield', 'known_source': '', 'rating': 5}, {'classification': 'NEW CANDIDATE', 'comments': 'guaranteed frb candidate', 'delete': False, 'id': 'shiny', 'rating': 5}]}
Registering datasets¶
When registering datasets, there are two broad categories of data:
-
Event based
(eg. full array baseband, intensity, tracking beam acquisition of pulsars as calibrators for events)a. Here, we need a tsar classification for the associated event before registering the event so that appropriate policies can be applied depending on its classification. To make this a bit more challenging, tsar classifications typically take two tsar-shifts since we need two classifications which could be anywhere from 4 - 24 hours.
-
Non-event based
(eg. N-squared, data captured for other WG studies)b. Here, we do not need a tsar classification for the associated event. The expectation is that the policies have been discussed with the DAWG and the large dataset has been created where smaller datasets will be grouped.
For complete list of dataset categories for CHIME/outriggers see this doc.
Given the complexity, we have designed a self-healing registration in two components:
-
Initial registration:
datatrail register-datasets
a. The
non-event based
datasets will likely have no problems being successfully registered in the first attempt. The common failure mode will be that the parent dataset does not exist.b. We will run this service (multiple replicas if needed) at each site. The expectation is that this service needs to be able to access the data path where the files are located so that it can compute the md5sums and file sizes etc.
c. This service periodically looks for work in its bucket for the site its running on. The expectation is that some other system is responsible for submitting work into this bucket so that it can be registered.
d. The
event based
datasets will always fail if the initial registration is attempted before two classifications are in because the system wont know where to put it. At the same time, we dont want users/other systems to remember to do this periodically. Instead, we shift this problem to a single serviceRetry failed registration
. -
Retrying failed registration:
datatrail register-failed-datasets
a. This periodically attempts to re-register failed datasets. It fetches information from a database that contains all the necessary payload to refister the files of an event with the exception of the larger dataset where the event needs to be registered. This way, if the underlying problem is solved (for eg. the associated now has two classifications), this system will try to re-register it and finally succeed.
b. We only need to run one servce. Best to do it on the CHIME frb-analysis cluster.
c. Admins are expected to keep an eye for common modes of failure and develop solutions to quickly resolve those via datatrail CLI/API.