Archive for December 2015

Probabilistic Models on Trial

scales

There are many modes of evidence accepted in courts of law. Each mode has its strengths and weaknesses, which will usually be highlighted to suit either side of the case. For example, if a witness places the defendant at the scene of the crime, the defense lawyer will attack her credibility. If fingerprint evidence is lacking, the persecution will say it's because the defendant was careful. Will inferences from probabilistic models ever become a mode of evidence?

It's a natural idea. Courts institutionally engage in uncertainty. They use phrases like "beyond reasonable doubt", they talk about balance of evidence, and consider precision-recall rates (it is "better that ten guilty persons escape than that one innocent suffer" according to English jurist William Blackstone). And the closest thing we have to a science of uncertainty is Bayesian modelling.

In a limited sense we already have probabilistic models as a mode of evidence. For example, the most damning piece of evidence in the Sally Clark cot death case was the testimony of a paediatrician who said that the chance of two cot deaths happening in one household, without malicious action, is 1 in 73 million. This figure was wrong because the model assumptions were wrong -- there could be a genetic or non-malicious environmental component to cot death but this was not captured in the model. But as is vividly illustrated by the Sally Clark case, inferential evidence is currently gatekept by experts. In that sense, the expert is the witness and the court rarely interacts with the model itself. The law has a long history of attacking witness testimony. But what will happen when we have truly democratized Bayesian inference?

Perhaps one day, in a courtroom near you, the defense and prosecution will negotiate an inference method, then present alternative models for explaining data relevant to the case. The lawyers will use their own model to make their side of the case while attacking the opposing side's model.

In what scenarios would probabilistic models be an important mode of evidence?

When there are large amounts of ambiguous data, too large for people to fit into their heads, and even too large/complex to visualize without making significant assumptions.

Consider a trove of emails between employees of a large corporation. The prosecution might propose a network model to support the accusation that management was active or complicit in criminal activities. The defense might counter-propose an alternative model that shows that several key players outside of the management team were the most responsible and took steps to hide the malfeasance from management.

In these types of cases, one would not hope for, or expect, a definitive answer. Inferences are witnesses and they can be validly attacked from both sides on the grounds of model assumptions (and the inference method).

If this were to happen, lawyers would quickly become model criticism ninjas, because they would need model criticism skills to argue their cases. Who knows, maybe those proceedings will make their way onto court room drama TV. In that case, probabilistic model criticism will enter into the public psyche the same way jury selection, unreliable witnesses, and reasonable doubt have. The expertise will come from machines, not humans, and the world will want to develop ever richer language and concepts that enable it to attack the conclusions of that expertise.

The Paradox of Epistemic Risk

Does your attitude to risk change based on the type of uncertainty you harbour? This is a blog post about epistemic risk v.s. non-epistemic risk.

Here is a quote from theconversation.com:

"Australians [have] an 8.2% chance of being diagnosed with bowel cancer over their lifetime [...] If we assume that a quarter of the Australian population eats 50 grams per day of processed meat, then the lifetime risk for the three-quarters who eat no processed meat would be 7.9% (or about one in 13). For those who eat 50 grams per day, the lifetime risk would be 9.3% (or about one in 11)."
There are at least two ways to interpret the above quote:
Option 1:
  • there is a 9.3% chance of getting bowel cancer for processed meat eaters and a 7.9% chance for non-processed meat eaters; genes don't matter

Option 2:

  • there is a x% chance of having the genes that make you susceptible to bowel cancer
  • if you have the genes that make you susceptible: there is a high chance of getting bowel cancer if you eat meat
  • if you don't have the genes that make you susceptible: it doesn't matter what you do, there is a low chance of getting bowel cancer
In either case, we can assume the marginal probability of getting bowel cancer is the same (i.e., we can adjust the percentages in option 2 to make them the same as option 1).If you're a processed meat eater, look at Option 1 and think to yourself: is never eating bacon or a burger again worth 1.6 percentage points reduction in risk? I'm not sure what my answer is, which means that the choices are fairly balanced for me.

Now look at Option 2. Does your answer change? For a rational agent it should not change. My inner monologue for Option 2 goes as follows: if I have the bad genes, then I'm definitely screwing myself over by eating processed meat, and I want to avoid doing that.

But you don't get to know what genes you have (at least, not yet, that will probably change in the next few years), so the main source of risk is epistemic. That is, you already have the genes that you have (tautological though it is to say), you just don't know which kind you have.

Here's what I think is going on: as we go about our lives we have to "satisfice" which means that we focus on actions that are expected to make big differences and try to avoid fiddling at the margins. Option 1 looks a lot like fiddling at the margins to me. Option 2 instead gives me more control: if I have the bad genes then I'm in much greater control over the risk of a terrible illness. But the greater control is illusionary: as long as I remain uncertain about the state of my genes, the utility of eating or not eating processed meat is the same for both Option 1 and Option 2. I call this the paradox of epistemic risk.

I must say that I'm ignorant of the latest research on psychology, any references for related ideas are welcome, please leave a comment!