That's not fair! A spike in our gut, a flare of anger, the weight of resentment. Fairness is in our nature. Humans are deeply socialized with a moral intuition for fairness; computers are not. But can computers be programmed with a functional substitute for fairness? This question is urgent today as more and more decisions are made by statistical algorithms.
But how can we make an algorithm fair without moral intuition? Many approaches have been proposed. Unfortunately they are almost all heuristic, meaning they provide a rule of thumb that is open to interpretation and lacks a coherent underlying theory or framework. A new approach proposed recently by a group of researchers from The Alan Turing Institute called counterfactual fairness provides a more principled approach to making algorithms fair and resolves many of the shortcomings of past approaches.
Counterfactual fairness poses the question, "If a protected personal attribute was hypothetically changed, does the system's decision change?" Consider the example of a woman applying to university. We could set gender as the "protected attribute," that is, an attribute that we believe is a source of unfair bias. We'd ask "If they were a man, would they have been accepted into this university?" When the answer is different in the real case and the hypothetical case, we say the decision was counterfactually unfair.
Importantly, the counterfactual fairness approach is much richer than simply hiding or ignoring the protected attribute. Consider our woman applying to university. In both the real case and the hypothetical, the candidate has the same "X-factor": intelligence, commitment, determination, and hustle. However her opportunities and circumstances are different. In the hypothetical scenario we attempt to model the effects of those differences across a lifetime, not just at the moment of the application. If our person with the same innate ability had experienced their lifetime in the switched gender, in this case as a male, would they have been accepted? This takes into account the downstream effects that would have been caused by the hypothetical change in the protected attribute.
How can you quantify the effects of a counterfactual when hypothetically changing the protected attribute? By using the relatively new field of causal modeling, pioneered by Judea Pearl and others. Causal modeling extends the classical observational tools of statistics (e.g., there is a correlation between smoking, stained teeth, and cancer) to reasoning about causes and intervention (e.g., stopping smoking will reduce cancer risk, but whitening your teeth will not). Causal modeling, or simply causality, provides the tools necessary to reason about what could cause what, design the experiments necessary to empirically test proposed causal relationships, and importantly, quantify the effect of an intervention or counterfactual scenario.
Consider the following causal diagram for an illustrative "red car" example:
In the diagram above the boxes are attributes, and the arrows indicate causality, that is, that one attribute causes another. In this example there are four attributes: "Ethnicity," "Red Car," "Risky Behavior," and "Car Crash." We can observe "Ethnicity" and "Red Car" but cannot directly observe "Risky Behavior." "Car Crash" is our prediction target and is only available in historical training data. The protected attribute is "Ethnicity." Our goal is to predict future car accidents.
Following the arrows we can see that "Risky Behavior" is a direct cause of "Car Crash" and also of "Red Car": we believe risky drivers are more likely to buy red cars. The arrow from "Ethnicity" to "Red Car" indicates that that car color can also be caused by belonging to a red-car-preferring ethnic group. The direction of the arrows matter: if I paint someone's car red it clearly does not affect their ethnicity.
The arrows that aren't present are just as important as the arrows that are. There is no arrow from "Ethnicity" to "Risky Behavior." In this causal model ethnicity does not cause risky behavior. The "Noise" nodes indicate our awareness that these causal relationships aren't perfect. Other factors outside the model will affect the attributes, but we are choosing to treat these unspecified factors as external random variables. A full causal model will also quantify the functional relationships between nodes and specify distributions for each noise variable. Causal models explicitly represent the causal basis of bias, so our assumptions can be clearly identified and tested.
If we built an algorithm to recommend insurance premiums and used "Red Car" to naively adjust insurance premiums, the result would become biased against the ethnicity which preferred red cars. To adjust the insurance premium fairly, we need to infer as much as we can about "Risky Behavior" given observations of "Ethnicity" and "Red Car," then decide using only the a posteriori beliefs about "Risky Behavior." If "Risky Behavior" was directly observed, building a counterfactually fair model would be easy because it is not directly or indirectly caused by "Ethnicity," we could just use it directly. "If I had a different ethnicity but the same risky driving behavior, would I get the same premium?" Yes, if the decision was only made on driving behavior. The difficulty is "Risky Behavior" is not directly observable; we need to infer it from other evidence. So long as the inference of risky behavior given ethnicity and car color is done using Bayesian methods within the causal model, subsequent decisions based on that inference will be counterfactually fair because they only use information from attributes not caused by ethnicity. Furthermore ethnicity can "explain away" a red car, reducting the predicted risky behavior.
Counterfactual fairness can be understood as a highly structured and rigorous form of affirmative action. A counterfactually fair system will in most practical cases adjust outcomes to be more favorable to disadvantaged groups. However the method of calculation is rigorous and repeatable; any person or computer calculating with the same statistical relationships will arrive at the same adjustment. It is highly structured because it provides a formal framework where assumptions can be evaluated empirically, tested, and criticized.
One often proposed fairness criteria is the equal false positive rate. In the case of recidivism prediction, we may require that the number of false positives for each group be equal across protected groups. The Northpointe COMPAS system was criticized by ProPublica because the proportion of black defendants incorrectly predicted to reoffend when they did not was higher than the proportion of white defendants also incorrectly predicted to reoffend. However the creators of this statistical model countered that it was fair in another way: It was "calibrated". It had equal predictive accuracy for both black and white defendants. Unfortunately these two notions of fairness cannot be achieved together, except in special situations. Counterfactual fairness can resolve incongruities like this by bringing in another critical piece of knowledge that is not available from statistics alone: the causal relationships that lead to bias. Instead of requiring ad-hoc statistical prescriptions, we can trace the complete causal structure of bias to eliminate it.
At first the attempt to mathematize something as intuitive, organic, and (dare I say it) human as fairness with a computer may seem wrong headed. But I argue this line is unreliable. There are many problems that were thought beyond the ability of a computer: chess, Go, and poker are immediate historical examples of tasks that many thought required the human spark, only to be fully mathematized via game theory, tree search, and machine learning. These games do have fixed rules, but then tasks such as speech recognition, image recognition, and natural language processing have no fixed rules, and computers have achieved significant practical performance in these domains, too. So why not a practical mathematical approach to fairness? The tools are ready to be put to the test.
Counterfactual fairness is a new approach to fairness in machine learning, statistical models, and algorithms. It draws on the new field of causality to go beyond statistical relationships and correlations and to model the root causes of differences between protected groups. A mathematized notion of fairness removes the fluff and emotion from fairness, allowing us to clearly model and compare our beliefs about causal relationships in the world, and to empirically test our claims about bias and how it affects decision-making.
Counterfactual fairness is an improvement on simple thresholds and statistical rules, giving us a rich theoretical framework to ask questions about fairness. Tools like this are the road to ensuring that algorithms—constantly making decisions about us—do so fairly.