Skip to content

CHIME/FRB Guidelines when using Python

Here we assume that you have knowledge of python and can produce basic scripts. If this is not the case, please refer to website.

Documentation

In Code

For in-code documentation we follow the standards developed by the NumPy community. Some of these standards are shared below, but for an in-depth read, checkout their official guide.

DocStrings

A documentation string (docstring) is a string that describes a module, function, class, or method definition. A docstring should include the following components:

  • A short summary
  • An optional extended summary
  • Parameters
  • Attributes - required by class __init__
  • Methods - maps to exposed functions for class
  • Returns or Yields for generators
  • Raises - if required
  • Notes (optional)
  • References (optional)
  • Examples (optional)
Example
class Foo(object):
    """Short description of the class

    Optional longer description of the class

    Attributes
    ----------
    x : float
        The X coordinate
    y : float
        The Y coordinate

    Methods
    -------
    bar(var1: list, var2: int, var3: str ='hi', *args, **kwargs)
        Some small description of bar
    """

    def __init__(self, x: float, y: float):
        self.x = x
        self.y = y

    def bar(var1: list, var2: int, var3: str = "hi", *args, **kwargs):
        """Short summary of the code

         Several sentences providing an extended description. Refer to
         variables using back-ticks, e.g. `var`.

         Parameters
         ----------
         var1 : array_like
             Array_like means all those objects -- lists, nested lists, etc. --
             that can be converted to an array.  We can also refer to
             variables like `var1`.
         var2 : int
             The type above can either refer to an actual Python type
             (e.g. ``int``), or describe the type of the variable in more
             detail, e.g. ``(N,) ndarray`` or ``array_like``.
         var3: {'hi', 'ho'}, optional
             Choices in brackets, default first when optional.
         *args : iterable
             Other arguments.
         **kwargs : dict
             Keyword arguments.

         Returns
         -------
         describe : type
             Explanation of return value named `describe`.
         out : type
             Explanation of `out`.

         Raises
         ------
         BadException
             Because you shouldn't have done that.

         Notes
         -----
         Notes about the implementation algorithm (if needed).

         This can have multiple paragraphs.

         You may include some math:

         .. math:: X(e^{j\omega } ) = x(n)e^{ - j\omega n}

         And even use a Greek symbol like :math:`\omega` inline.

        References
        ----------
         Cite the relevant literature, e.g. [1]_.  You may also cite these
         references in the notes section above.

         .. [1] O. McNoleg, "The integration of GIS, remote sensing,
           expert systems and adaptive co-kriging for environmental habitat
           modelling of the Highland Haggis using object-oriented, fuzzy-logic
           and neural-network techniques," Computers & Geosciences, vol. 22,
           pp. 585-588, 1996.

         Examples
         --------
         These are written in doctest format, and should illustrate how to
         use the function.

         >>> a = [1, 2, 3]
         >>> print([x + 3 for x in a])
         [4, 5, 6]
        """

    pass

Comments

Always make sure that comments are up-to-date with changes in the code, as comments that contradict the code are worse than no comments at all!

Comments should be written in complete sentences with the first word capitalised. Block comments generally consist of one or more paragraphs built from complete sentences ending in a full stop. You should use two spaces after a sentence-ending full stop in multi-sentence comments, except for after the final sentence.

Block comments

These apply to some or all of the code that follow them, and are indented to the same level as the code they are for. Each line starts with a hash followed by a single space #. Paragraphs inside block comments are separated by a line containing a single hash.

Correct

def quadratic(a, b, c, x):
    # Calculate the solution to a quadratic equation using the quadratic
    # formula.
    #
    # There are always two solutions to a quadratic equation, x_1 and x_2.
    x_1 = (- b+(b**2-4*a*c)**(1/2)) / (2*a)
    x_2 = (- b-(b**2-4*a*c)**(1/2)) / (2*a)
    return x_1, x_2
Inline comments

These should be used sparingly. Inline comments are placed on the same line as the code they describe. They should be separated from the code by at least two spaces and start with a hash and single space #.

Wrong

some_variable = quadratic(a, b, c, x) #Solves the equation for x

Correct

some_variable = quadratic(a, b, c, x)  # Solves the equation for x

Inline comments are distracting and so should not be used if they state the obvious.

Wrong

x = x + 1  # Increment x

But sometimes, this is useful:

Correct

x = x + 1  # Compensate for border

Style Conventions

Import Statements

Use import statements for packages and modules only, not for individual classes or functions. Note: that there is an explicit exemption for imports from the typing module.

Correct

import x  # For importing packages and modules.
from x import y  # Where x is the package prefix and y is the module name with no prefix.
from x import y as z  # If two modules named y are to be imported or if y is inconveniently long.
import y as z  # Only when z is a standard abbreviation, e.g. import numpy as np

Imports should usually be on separate lines:

Correct

import os
import sys

Wrong

import os, sys

However, it is okay to do this:

Correct

from subprocess import Popen, PIPE

Imports are always put at the top of file, just after module comments and docstrings, but before constants. Imports should be grouped in the following order: 1. Standard library imports. 2. Related third party imports. 3. Local application/library specific imports. A blank line should be placed between groups.

Naming Convention

When naming anything, the names that you give should be meaningful and descriptive. Avoid using throwaway names such as "temp". They should also follow the naming convention below:

  • package_name
  • module_name
  • ClassName
  • method_name
  • ExceptionName
  • function_name
  • CONSTANT_NAME
  • var_name
  • function_parameter_name
  • local_var_name

Type Annotations

Get to know PEP 484. You are not required to annotate all the functions in a module, but - Use judgement to get to a good balance between safety and clarity on the one hand, and flexibility on the other. - Annotate code that is prone to type-related errors (previous bugs or complexity). - Annotate code that is hard to understand. - Annotate code as it becomes stable from a types perspective. In many cases, you can annotate all the functions in mature code without losing too much flexibility.

def another(thing: list) -> list:
    return thing


def something(self, first_var: int) -> int:
    pass

Indentation

Indentations should use four spaces per indentation level.

Correct

# Aligned with opening delimiter:
foo = long_function_name(var_one, var_two,
                         var_three, var_four)

# Add 4 spaces (extra indentation level) to distinguish args
def long_function_name(
        var_one, var_two, var_three, 
        var_four):
    print(var_one)

# Hanging indents should add a level
foo = long_function_name(
    var_one, var_two, 
    var_three, var_four)

Wrong

# Arguments on first line forbidden without vertical alignment
foo = long_function_name(var_one, var_two,
    var_three, var_four)

# Further indentation required as indentation not distinguishable
def long_function_name(
    var_one, var_two, var_three,
    var_four):
    print(var_one)

The closing brace/bracket/parenthesis on multi-line constructs may be ligned up with the first non-whitespace character:

Correct

my_list = [
    1, 2, 3,
    4, 5, 6,
    ]

or it may be lined up under the first character the starts the multi-line construct:

Correct

my_list = [
    1, 2, 3,
    4, 5, 6,
]

Testing

Writing test code and running these tests in parallel is best practice. It can also be a useful tool to help define your code's purpose more precisely.

General guidelines:

  • Each test unit should focus on a tiny bit of functionality and prove it correct.

  • Each test should run independently, regardless of the order that they are called in. The implication of this rule is that each test must be loaded with a fresh dataset and may have to do some cleanup afterwards.

  • Try hard to write tests that run fast, of the order of a few milliseconds.

  • Always run the full testing suite both before and after a coding session. This will give you confidence that you did not break anything.

  • Use hooks that run tests before pushing to a repository.

  • Long test function names are fine, as they are never called explicitly. test_square_of_number2() and test_square_negative_number2() are fine and descriptive, which is important as test names are displayed when a test fails.

  • As with all of your code, write descriptive and clear comments.

Testing with pytest

Installing pytest

To install pytest, simply run:

pip install pytest

Test files are located in the tests directory and have either the test_ prefix or _test suffix. Within the test file, all test functions should start with test_.

The simplest type of test is an assertion, this is where we assert that something is true. To see this in action, let's create a new test file called test_simple.py, paste in the following code and save the file.

Simple test

1
2
3
4
5
6
# Content of test_simple.py
def func(x):
    return x + 1

def test_answer():
    assert func(3) == 4

Here we have a simple function func() which takes an integer and adds one to it. The test is the function below, test_answer(), which inputs the value three into func() and expects an output of four. When we start the test by running pytest in the same directory, we get the following output:

============================= test session starts ==============================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/name/dir
collected 1 item

test_sample.py .                                                         [100%]

============================== 1 passed in 0.03s ===============================

You can see that the function works as intended and the test passes. Now if we change line 6 to falsely assert that it should equal 5, this is the output that we receive when a test fails.

============================= test session starts ==============================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/name/dir
collected 1 item

test_sample.py F                                                         [100%]

=================================== FAILURES ===================================
_________________________________ test_answer __________________________________

    def test_answer():
>       assert func(3) == 5
E       assert 4 == 5
E        +  where 4 = func(3)

test_sample.py:6: AssertionError
=========================== short test summary info ============================
FAILED test_sample.py::test_answer - assert 4 == 5
============================== 1 failed in 0.10s ===============================
In this output, pytest tells exactly where the error occurred and it shows in the summary that the test failed because we are trying to assert that four equals five.

Multiple tests in a file

Using pytest it is also possible to place multiple tests in a single file. May also want to group them in a class, as done below.

Grouping tests in a Class

# content of test_class.py
class TestClass:
   def test_one(self):
        x = "this"
        assert "h" in x

    def test_two(self):
        x = "hello"
        assert hasattr(x, "check")

Danger

Be aware though that each test has a unique instance of the class. Try to foresee what will happen in the example test below, then run the test.

# content of test_class_demo.py
class TestClassDemoInstance:
    value = 0

    def test_one(self):
        self.value = 1
        assert self.value == 1

    def test_two(self):
        assert self.value == 1

Test fixtures

Test functions can request fixtures they require by declaring them as arguments. When pytest runs a test, it looks at the parameters in that test function, and searches for fixtures that have the same names as those parameters. Once pytest finds them, it runs those fixtures, captures what they returned (if anything), and passes those objects into the test function as arguments.

Essentially, the pytest fixture system gives the ability to define a generic setup step that can be reused over and over, just like a normal function would be used. Two different tests can request the same fixture and have pytest give each test their own result from that fixture.

This is extremely useful for making sure tests aren’t affected by each other. We can use this system to make sure each test gets its own fresh batch of data and is starting from a clean state so it can provide consistent, repeatable results.

Fixture example

import pytest


class Fruit:
    def __init__(self, name):
        self.name = name
        self.cubed = False

    def cube(self):
        self.cubed = True


class FruitSalad:
    def __init__(self, *fruit_bowl):
        self.fruit = fruit_bowl
        self._cube_fruit()

    def _cube_fruit(self):
        for fruit in self.fruit:
            fruit.cube()


# Arrange
@pytest.fixture
def fruit_bowl():
    return [Fruit("apple"), Fruit("banana")]


def test_fruit_salad(fruit_bowl):
    # Act
    fruit_salad = FruitSalad(*fruit_bowl)

    # Assert
    assert all(fruit.cubed for fruit in fruit_salad.fruit)

Parameterising of test arguments

A built-in decorator, pytest.mark.parameterize, enables the parameterisation of arguments for a test function. This allows a compact way of testing various argument values in a test.

Parameterisation example

# content of test_expectation.py
import pytest


@pytest.mark.parametrize("test_input,expected", [("3+5", 8), ("2+4", 6), ("6*9", 42)])
def test_eval(test_input, expected):
    assert eval(test_input) == expected

Note that it is also possible to combine multiple parameterised arguments by stacking the parameterized decorators in order to work through multiple permutations. The example below will set the values of (x,y) in the following order: (0,2), (1,2), (0,3), and (1,3).

Stacking parameterised decorators

import pytest


@pytest.mark.parametrize("x", [0, 1])
@pytest.mark.parametrize("y", [2, 3])
def test_foo(x, y):
    pass

Code Test Coverage

This is an important metric to help you understand how well tested your software is. There are multiple criteria that can be used to determine if your code was tested. A few examples include: function coverage, branch coverage, and line coverage. It is important that when working on a repository, you try not to decrease the test coverage with your pull request.

Installing the pyest-cov coverage tool

To measure the coverage of our code we can use pytest-cov, which is installed by simply running.

pip install pytest-cov

Below is an example script and corresponding tests:

class Calculator:
    def add(x, y):
        return x + y

    def subtract(x, y):
        return x - y

    def multiply(x, y):
        return x * y

    def divide(x, y):
        if y == 0:
            return 'Cannot divide by 0'
        return x * 1.0 / y
from ..calculator import Calculator


def test_add():
    assert Calculator.add(1, 2) == 3.0
    assert Calculator.add(1.0, 2.0) == 3.0
    assert Calculator.add(0, 2.0) == 2.0
    assert Calculator.add(2.0, 0) == 2.0
    assert Calculator.add(-4, 2.0) == -2.0

def test_subtract():
    assert Calculator.subtract(1, 2) == -1.0
    assert Calculator.subtract(2, 1) == 1.0
    assert Calculator.subtract(1.0, 2.0) == -1.0
    assert Calculator.subtract(0, 2.0) == -2.0
    assert Calculator.subtract(2.0, 0.0) == 2.0
    assert Calculator.subtract(-4, 2.0) == -6.0

def test_multiply():
    assert Calculator.multiply(1, 2) == 2.0
    assert Calculator.multiply(1.0, 2.0) == 2.0
    assert Calculator.multiply(0, 2.0) == 0.0
    assert Calculator.multiply(2.0, 0.0) == 0.0
    assert Calculator.multiply(-4, 2.0) == -8.0

def test_divide():
    assert Calculator.divide(1, 2) == 0.5
    assert Calculator.divide(1.0, 2.0) == 0.5
    assert Calculator.divide(0, 2.0) == 0
    assert Calculator.divide(-4, 2.0) == -2.0

Copy and paste the above script and tests to files with the corresponding names. To run the tests and see their coverage use the following.

Run tests and display coverage

pytest --cov

Result

============================= test session starts ==============================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/name/dir
plugins: cov-3.0.0
collected 2 items

test_calculator.py ....                                                 [100%]

---------- coverage: platform darwin, python 3.9.12-final-0 ----------
Name                           Stmts   Miss  Cover
--------------------------------------------------
calculator.py                     11      1    91%
test_calculator.py                25      0   100%
--------------------------------------------------
TOTAL                             36      1    97%


============================== 2 passed in 0.11s ===============================

Codecov

An additional tool for visualising code coverage and integrating it into your repositories is CodeCov. To familiarise yourself with it, I recommend using their tutorial which can be found here.

Further reading for testing

For further information on using pytest, please refer to the documentation.

Practice what you have learnt here

There is an interactive repository designed to help you digest and practice the skills that you have learnt here. Navigate to the repository and follow the instructions: https://github.com/tjzegmott/python-style_guide