Run custom processes on the cluster
This is an example of how to run processes on the cluster using the function spawn_baseband_job
in chime_frb_api
(here).
Edit file names and content to your needs.
Note that the cluster does not have infinite resources and baseband jobs are usually heavy, use this with care to avoid the necessity of putting limits on the cluster use.
File "/data/frb-archiver/baseband_test/cluster_test/job.py"
# This job will be run in the cluster
# In the cluster, "/data/frb-archiver" is accessible under the name "/frb-archiver" and "/data/frb-baseband" is accessible under the name "/baseband-archiver"
# Here is an example to read a beamformed baseband file and write the intensity of the first channel to a numpy array
import numpy as np
from baseband_analysis.core import BBData
file_in = "/frb-archiver/baseband_test/xy_localization/localization_34605375.h5"
data = BBData.from_file(file_in)
channel = data["tiedbeam_power"][0,0]
np.save("/frb-archiver/baseband_test/cluster_test/cluster_test", channel)
File "/data/frb-archiver/baseband_test/cluster_test/start_job.py"
# This script executes a command in the cluster.
# It must be executed inside a baseband container in frb-analysis, see https://github.com/CHIMEFRB/baseband-analysis/wiki/Usage-of-the-Docker-system
import chime_frb_api
# Load the API
base_url_master = "http://frb-vsop.chime:8001"
master = chime_frb_api.frb_master.FRBMaster(base_url=base_url_master)
# Submit a job
response = master.swarm.spawn_baseband_job(
34605375, # Event ID, for tracking
"Daniele_cluster_test", # Process name, for tracking and monitoring. Insert your name and/or an explanatory one!
command=["python", "/frb-archiver/baseband_test/cluster_test/job.py"], # Command to run in the cluster
job_mem_limit=1*1024**3 # Maximum memory usage in bytes. Processes exceeding this will be killed automatically without notice
)
To check the status of all baseband jobs, you can run python /path/to/baseband-analysis/automated_pipeline/manage_pipeline.py get_job_status
.
To check your job specifically, you can run python /path/to/baseband-analysis/automated_pipeline/manage_pipeline.py get_job_status -job_name baseband-34605375-Daniele_cluster_test
. In general, the job name is made as baseband-<Event ID>-<Process name>-<Random characters>
, the script will parse all the jobs beginning with -job_name.
Job status such as preparing
means that the system is busy and the job will be starting ASAP. Jobs currently processing are marked as running
. If a job finishes without errors, it is marked as complete
, as failed
otherwise.
It is your responsibility to clean the job history. Completed jobs can be removed with python /path/to/baseband-analysis/automated_pipeline/manage_pipeline.py prune_jobs
(this will clean completed jobs for all the baseband group, so don't panic if you don't see your completed job in the list). All the jobs can be removed with python /path/to/baseband-analysis/automated_pipeline/manage_pipeline.py kill_jobs -job_name <JOB NAME>
. A list of jobs that are going to be killed will be shown, read that carefully before confirming. Note that all the jobs starting with <JOB NAME>
will be selected to be killed, so be specific about the jobs you want to kill. If you kill a job by accident, inform the person indicated in the job name if possible, me if not.