Testing

Property-Based Testing: An Introduction

Illustrated by Leandro Lassmar

Testing gets a lot of lip service, but not a lot of actual attention.
All of this talk without action distracts from actually addressing the problems with testing, and instead of continuing this trend, we should work on solutions.
This article explores the problems with current testing approaches and proposes a newer idea that tries to solve some of the real issues with the current approach.

Unit Testing in a Nutshell

To explain property-based testing using FIRST principles, we first have to be clear about how testing works.
In this article, we will contrast property tests with non-property tests. (Often we call non-property tests unit tests, but this naming is not great because not all non-property tests are unit tests and not all property tests are not unit tests.)

A non-property test, in its most basic expression, is a piece of code that either does or doesn't crash.
If the piece of code crashes, we say that the test fails.
Otherwise we say that the test succeeds.
Assertions are then just a nice way to make the code crash (see the following pseudocode):

shouldBeEqual (a, b) {
  if (a == b) then crash("expected: {}, actual: {}", a, b) else return;
}

The shouldBeEqual method here will make the code crash if its arguments are not equal.

Consider this example (in pseudocode):

myUnitTest () {
  let a = 5
  let b = 4
  let s = add(a, b)
  shouldBeEqual(s, 9)
}

We use a testing framework to check whether myUnitTest crashes or not and report a test failure if it does.
The feedback that the testing framework provides to the user here is very simple: "This test failed."
The error message in the crash will help us identify what happened and debug the problem.

The Coverage Problem of Unit Testing

Testing is often considered boring and tedious.

The first big reason why unit testing can feel pointless is because of how little confidence it provides that the code is actually correct.
To get an idea why this happens, let's look at a single hypothetical function with one argument:

function doSomethingWithAByte (a : Byte) { ... }

To test whether this function "works," you would probably write unit tests that check the result for typical usage of the function.
You would also want to try a typical case where the function does something unexpected.
However, you would probably neglect to write any tests about special values of type Byte like 0 or 255.
Moreover, the bigger the type of the argument, the more cases you would never test.

Now let's make this a more real-world example by adding more parameters.
To write unit tests for a function with two arguments, you would now also have to write tests for each interaction between the two arguments.
The number of tests you would have to write would no longer grow linearly with the size or type of argument, but quadratically.

Extending this argument shows that writing tests for functions with a growing number of argument turns into a completely infeasible approach.
Indeed, the number of tests you would have to write would grow exponentially in the number of arguments to your function.
Writing tests simply does not seem to scale.

There are ways to work around the scaling problem using small types, coverage checking, weak coupling, and strong cohesion, and that is what is usually recommended.
A more head-on approach to dealing with this problem involves not writing tests but generating them instead.
This is where property-based testing comes in.

The second big reason why unit testing feels pointless is because you have to already know what "incorrect" code means in order to test whether your code is incorrect.
This causes extra frustration because you might (falsely) think that if you knew what "incorrect" meant, you would not write incorrect code in the first place.

Program testing can be used to show the presence of bugs, but never to show their absence!—Edsger Dijkstra

Property-based testing lets you discover ways in which your code is incorrect that you had never even considered.
Randomness will play a big role in making that happen in practice.

Property-Based Testing in a Nutshell

The only difference between code for a property test and code for a unit test is that code for a property test has an argument.
Here is an example property test that tests whether a reverse function is correct (in pseudocode):

myPropertyTest(list) {
  let reversed = reverse(list)
  let reversedTwice = reverse(reversed)
  shouldBeEqual(reversedTwice, list)
}

A property test serves to test whether its code ever crashes for any argument.
If the code crashes for any argument, we say that the test fails.
If the code never crash for some predefined amount number of cases, we say that the test passes.

Ideally the same testing framework that you used before also supports property-based testing.
If it does, then we will now see a bit more information when a test fails.
In addition to which test failed, we will now see a counterexample for the property: the input argument that made the test fail.

Arguments to the Property Test

The important question is, of course, where do we get the arguments to give to the property test?
The answer will determine a lot of whether this approach will work in any particular case.

Exhaustive Property-Based Testing

One approach is to pass in every possible value of the type of the argument.
This approach is called exhaustive property-based testing.

Take the example of a property test for the associativity of && (in pseudocode):

associativityOfAnd(Boolean a, Boolean b, Boolean c) {
  let left = a && (b && c)
      right = (a && b) && c
  shouldBeEqual(left, right)
}

There are eight possible combinations of arguments, so we can use exhaustive property-based testing and just try them all.
This approach is obviously infeasible in languages without types.
It will also not work very well if the argument has a large type like a uint64 or an unbounded type like string.

Randomized Property-Based Testing

The most popular approach to do property-based testing is called randomized property-based testing.
It involves generating the arguments to the property test by deterministic randomness.
A predetermined number of arguments are generated—usually a hundred.

The advantage of this approach is that it is usually much cheaper than exhaustive property-based testing.
More importantly, it is always feasible, whereas exhaustive property-based testing is not.
The disadvantage of this approach is that there exist false-negatives in scenarios where the hundred examples that were generated just happened to not hit the exact code path that could make the test fail.

If one were to use actual (pseudorandom) randomness, then tests could also become flaky (pass sometimes but not other times).
That's why it is important to set a seed for the randomness and generate the examples deterministically.
This way, you get predictable tests without having to come up with the arguments yourself.

Shrinking

Larger examples tend to be more likely to be counterexamples for any given property, so the property test is run with arguments in increasing order of size.
This way any easy failures are found early.
The empty list is usually an easy counterexample like that.
However, the code is also more thoroughly tested with larger inputs.
Note that these inputs are much larger and more complicated than the inputs that you would ever try to write for unit tests.
The reasoning is that larger arguments are much more likely to be counterexamples than the small arguments that you might choose yourself.

The problem with these large inputs is that they become unwieldy during debugging very quickly.
This is where shrinking comes in.
Here is a typical example without shrinking (in pseudocode):

myProperty(string) {
  let upperCased = string.map(makeUpperCase)
  shouldAllBeUpperCase(upperCased)
};

The myProperty property tests whether it is true that after you uppercase every character in a string, all the characters in that string are uppercase.
With shrinking turned off, you might see a counterexample like this:

"x\CAN\32937J\ENQ^\DC1?\FS\96943\&0\74134V+"

It is certainly not obvious to us why this is a counterexample for the property.
With shrinking turned on, this counterexample would be shrunk to smaller versions.
Each smaller version would be tried until the testing framework has found the smallest version of the argument that still fails the test.
It will be shrunk to this smaller counterexample:

"1"

Indeed, the "1" character is not considered uppercase, and when it is fed through makeUpperCase, it stays "1".

In this case the property just does not (and should not) hold.
However, if this is an issue with the code instead, we should start by writing a good regression test:

myRegressionTest() {
  let upperCased = '1'.map(makeUpperCase)
  shouldAllBeUpperCase(upperCased)
}

Then you can start the process of debugging and fixing the code; but we trust you with doing that part yourself.

Further Reading

Property-based testing is an immense topic, and this article is just an introduction.
There is plenty more to learn and try, so we suggest you get started in your programming language of choice.

Property-Based Testing in Your Language

Here are some links to implementations of property-based testing frameworks in different languages:

Ideas That Build on Property-Based Testing

Property-based testing as a field is still under active development.
What you have seen here is just the tip of the iceberg.

  • Exhaustive property-based testing: Try out all values of a given type smallcheck
  • Validity-based testing: Get generators and shrinking functions for free Validity
  • Property discovery: Discover properties to test about your code QuickSpec, Easyspec
  • Mutation testing: Test the quality of your test suite Fitspec
  • Counterexample extrapolation: Find more general counterexamples Extrapolate

Conclusion

Property-based testing has great potential to improve your life as a developer, but also the quality of software in general.
We believe that if we can make quality cheap enough, better software should come about naturally, eventually.
Property-based testing is a big step in that direction.

Tom Sydney Kerckhove

author

Professional Weirdo, Technical Leader and Engineering Manager. Expert functional programmer, specialised in (property) testing. Reach out on twitter at @kerckhove_ts

Leandro Lassmar

illustrator

Leandro Lassmar is an illustrator living in Minas Gerais, Brazil.
He worked in animation studios, currently works for magazines, books and advertising.
It won the SND (society of news design) and ÑH - Lo Mejor del Diseño Periodístico awards.