Working with Data Readers
The fitburst
codebase uses a DataReader
object for the following purposes:
- to ensure that all attributes necessary for downstream analysis are initialized;
- to provide methods for normalizing and (incoherently) dedispersing the input spectrum, if necessary;
- to allow for experiment-specific modularity in pipeline settings.
A Base Class for Data Readers¶
A DataReader
is a child of the ReaderBaseClass
object. The ReaderBaseClass
object defines all key attributes and nearly all methods used in a typical execution of fitburst
. One method of the base class, load_data()
, is intentionally left undefined as it depends on the nature of the input data.
A Data Reader for the "Generic" Format¶
As an example of how to use the ReaderBaseClass
, we have provided a DataReader
that parses the "generic" data format discussed on the preceding page. It can be imported and invoked in the following way:
from fitburst.backend.generic import DataReader
# read in data stored in the "generic" format.
data = DataReader()
data.load_data("input_data.npz")
Customizing Data Readers¶
The DataReader
example shown above loads in all data from the input file into the various attributes instantiated by the ReaderBaseClass
object. However, it is important to note that all input-dependent steps (e.g., the file format) are encapsulated in the load_data()
method only. It is therefore possible to modularize fitburst
such that the algorithm can work for a wide range of data formats. The only necessary development would be in creating a new DataReader
that can correclty parse the input data format.
Cleaning and Flagging Data¶
There is a preprocess_data()
method in the ReaderBaseClass
object that normalizes and baseline-subtracts each channel and determines a set of "good" frequencies used by downstream fitting routines. The determination of good and bad frequencies is based on outliers of variance and skewness distributions for the time-averaged spectrum.
All options for the preprocess_data()
method are optional. Below is an example of its invocation with all arguments set to their default values:
# now apply cleaning algorithm.
data.preprocess_data(
normalize_variance = True,
skewness_range = [-3., 3.],
variance_range = [0.2, 0.8],
variance_weight = 1.,
)
The above method call with replace the original, raw spectrum stored in data.data_full
with the normalized, cleaned spectrum. Also, the above spectrum will overload the data.good_freqs
attribute with a list of booleans that indicate frequencies which are deemed useable (True
) or unusable (False
).
Retrieving Burst Parameters¶
The generic-format data stores previous estimates of the burst parameters in the .npz
data file. The DataReader
for the generic format then stores these parameters as a Python dicitonary:
# now extract parameters from npz file.
initial_parameters = data.burst_parameters
print("DM values: ", initial_parameters["dm"])
input_data.npz
file contains data for a three-component burst from FRB 121102, where the burst-averaged DM was previously found to be 557.0 pc cm\(^{-3}\). If the .npz
file was generated correctly, then the above print()
statement should show:
DM values: [557.0, 557.0, 557.0]
Dedispersing and/or Windowing the Input Spectrum¶
The ReaderBaseClass
contains two algorithms for de-dispersing and windowing the raw spectrum data. These methods are optional, but may be necessary if the input spectrum spans several seconds of data, and/or if the data are either dispersed or de-dispersed to a suboptimal DM value. In all cases, the data.is_dedispersed
attribute must accurately reflect whether the input spectrum is already de-dispersed (True
) or not (False
).
Here's an example of a de-dispersion call:
data.dedisperse(
initial_parameters["dm"][0],
initial_parameters["arrival_time"][0],
reference_freq = initial_parameters["ref_freq"][0]
)
The above call will use the input values and axes information (e.g., data.freqs
. data.times
, etc.) to compute a map of de-dispersion index values. These index values are then used by the window_data
method to obtain a "windowed" (i.e., zoomed-in) version of the de-dispersed spectrum:
window = 0.08 # in seconds
# before doing anything, check if window size doesn't extend beyond data set.
# if it does, adjust down by an appropriate amount.
window_max = data.times[-1] - initial_parameters["arrival_time"][0]
if window > window_max:
window = window_max - 0.001
print("INFO: window size adjusted to +/- {0:.1f} ms".format(window * 1e3))
data_windowed, times_windowed = data.window_data(params["arrival_time"][0], window=window)