Property Based Testing

In software testing, we often want to examine a space of possibilities, and check that something remains true in the entire space.

A nice approach for this is “property-based testing”. It can take some work and creativity to find ways to use it, but when you find a good one, property-based testing has a lot of power.

The examples below will all use the hypothesis library:

Toy Example: Inverse Operations

Here’s a toy example: let’s say we have a serialize method that turns some text into binary, and a deserialize method that turns binary back into text. These are inverse operations, which makes it easy to think about possibility spaces to examine.

For example, for the entire possibility space of “pieces of text”, this should be true:

assert input_text == deserialize(serialize(input_text))

Traditionally, we’d dream up a bunch of input examples ourselves and implement them one-by-one as unit tests:

def test_serialize_hello():
    assert 'hello' == deserialize(serialize('hello'))

def test_serialize_with_spaces():
    assert 'this has spaces' == deserialize(serialize('this has spaces'))

def test_serialize_emoji():
    assert 'wow 😀😀😀 wow' == deserialize(serialize('wow 😀😀😀 wow'))

# ...and so on

We can DRY that up a bit using parameterized tests:

import pytest

@pytest.mark.parametrize("input_text", [
    'this has spaces',
    'wow 😀😀😀 wow',
def test_serialize(input_text):
    assert input_text == deserialize(serialize(input_text))

But, we’re still limited by the examples we can think of - we won’t come up with the really weird ones that expose edge cases.

Better to let the robots create examples for us:

from hypothesis import given, strategies as st

def test_serialize(input_text):
    assert input_text == deserialize(serialize(input_text))

Without further configuration, that test will come up with 200 strings of unicode - it will come up with stranger things than you can readily imagine, and will give you a replicable example when it finds a string that fails this test. You can configure the hypothesis library to come up with however many examples you want - you could for example search 200 during fast pre-commit test runs, but run 100,000 examples for a nightly CI run.

This is a powerful and ergonomic way to search a possibility space, and it’s called “property based testing” - the idea is that we want to check that some “property” is true throughout the entire space, so we sample widely within the space and look for counterexamples. These counterexamples point us toward bugs with our software.

Real Example: Implementation Migrations

Another is a case where you have a working implementation of a piece of logic to use as a reference, and you want to replace it with another implementation that behaves identically.

We had a case like that recently in the a python application I maintain at work. We had some logic intended to synchronize a database’s representation of data to data in an external system: it first deleted all database records, then created new records based on data from the external system.

This worked fine, but during the several minutes the process took to run, lots of database records were deleted and not yet recreated, leading to various strange user-facing bugs.

We wanted to replace it with a new interface that would avoid these up-front deletes, so we wrote a new implmentation, and wrote a property-based test to ensure that the implementations behaved identically.

The details aren’t terribly important - the key idea is that we came up with a test input class that encapsulated the various logical differences between scenarios we cared about, told hypothesis to generate all permutations of these scenarios, and checked that the old and new implementations produced identical database states, given the various initial conditions prescribed by the scenarios. We did indeed find weird edge case bugs in the process, and each one is a bug we now won’t need to deal with in production.

Future Example: Stateful Testing

I don’t have a good example for this one at my current employer yet, but I’m on the lookout. There is a class of software quality techniques called “formal methods” - the most popular tools for using these techniques are specialty languages called TLA+ and Alloy.

The elevator pitch for formal methods is: you write a formal specification for a system, specify things about it that should always be true (invariants) and the tool will search the possibility space and show you sequences of legal steps that lead to illegal states (bugs, in other words).

Think about, say, a distributed user management system, where users can be created, deleted, and their permissions can be modified. Say you want to make sure that deleted users can never take actions - you could write a model with a formal methods language describing the state changes in that system and express your requirement as an invariant, and ask the tool to find steps of user-management actions that leave the user in a state where their account has been deleted, but they’re still able to take actions, maybe because of a cache invalidation bug somewhere.

As of 2014, AWS had used formal methods for system design for S3, DynamoDB, EBS, and and a distributed lock manager, I’m sure they’ve used them more since.

Formal methods are even more powerful than property-based testing, but have a considerably steeper learning curve - I’ve bounced off the topic a few times in the last few years, but have yet to use formal methods for anything useful.

However, property-based testing gets at the same concept, and is a lot easier to get started with. Hypothesis supports this idea: you can use it to define “rules-based state machine” with steps and invariants, and it will similarly look for sequences of steps that violate invariants.

To get a feel for this sort of thing, you can use it to solve logic puzzles like the Die Hard water-jug problem, the Towers of Hanoi puzzle, etc.

More to come on that if/when I find a good Guild-relevant use case.

If you’re interested in this

I recommend reading the introductory articles for hypothesis, and then reading some of their documentation. Both links can be found below.

It’s not always terribly easy to find places to apply property-based testing, but I bet they exist in your application, and adopting a tool like this can improve your life. Give it a try!