A Causal Study of Simulated Patient Recovery Data
I'm currently reading "Do No Harm", by Henry Marsh, a fascinating autobiography of a neurosurgeon. Since I am also reading about causality in statistics, I naturally noticed points in the book where confounding might confuse beliefs about a doctor's skill. Here I consider two approaches used in causal analysis to overcome confounding.
Imagine you're in charge of a bunch of hospitals and are looking at the survival statistics of all the patients with a certain illness (e.g., brain tumours) to help you decide which doctors to promote. (This is probably not how doctors are promoted, but anyway). You might be tempted to pick the doctors with the best survival rates. This is wrong because it ignores the assignment mechanism of patients to doctors. Let's see how wrong you could be with a specific example.
doctor id | | num patients | | survival rate |
---|---|---|
0 | 117 | 0.205 |
1 | 85 | 0.812 |
2 | 93 | 0.129 |
3 | 88 | 0.216 |
4 | 98 | 0.153 |
5 | 112 | 0.830 |
6 | 89 | 0.809 |
7 | 109 | 0.138 |
8 | 102 | 0.824 |
9 | 107 | 0.841 |
Let's make a set of statistical modelling assumptions about how this data was generated.
Assumptions:What this generative process is saying is that we should try to assign highly skilled doctors to severely ill patients and let the rest of the doctors take care of the rest of the patients, but this process is noisy and there is a non-zero probability that an average doctor care for a very ill patient or a very skilled doctor will care for a less ill patient.
Why this recovery probability? If the illness is severe, then the best that a skilled doctor can do is ensure a low probability of recovery, while a less skilled doctor will do even worse than that. If the illness is not severe, then it doesn't matter whether the patient gets a very skilled doctor or not, they are likely to recover.
In reality, the patient covariates \(\theta_d\) would not be a simple binary number, it would be features of the illness (e.g., tumour size and location, age of patient), and doctors might have skills along multiple dimensions. Binary numbers suit our thought experiment for the moment, but we'll revisit the dimensionality of covariates later.
Using this generative model and a given set of parameters \((p_\mathrm{high}, p_\mathrm{low}, p_\mathrm{very\; low}, N, D, \theta, \beta)\), I generated 10 doctors and 1000 patients (the data you see in the table above, actually). Now, let's pretend that we don't know \(\beta\), the doctor skill that we are trying to estimate from the data alone. Would you want to see doctor 4? Naively, no, but who knows, they might be the top specialist in the country for your illness. I consider how to answer that question next.Causality assumptions
We need to make a set of assumptions to allow us to perform a causal analysis. I'm writing them out explicitly for good statistical hygiene but won't discuss them at length here. First, we need the stable unit treatment value assumption (SUTVA) which says that there are no interactions between units (patients) in the sense that the outcome of treatment for patient n does not depend on the assignment or recovery of any of the other patients. This might be conceivably broken if doctors become tied up with patients limiting their ability to see new patients. Second, we need unconfoundedness, which says that assignment is conditionally independent of the outcome (i.e., recovery) given the patient covariates (i.e., judged severity of illness). This might be broken if there are some unknown patient covariates (e.g., where they live) that affect assignment. And the bad news from Imbens & Rubin 2015 is that we can never tell if this is the case. Third, we need individualistic and probabilistic assignments (see Imbens & Rubin 2015).Recovery Rate Stratification
Here is what the recovery rates look like for each doctor when we stratify by patient severity:doctor id | | num severe patients | | recovery rate (non-severe) | | recovery rate (severe) |
---|---|---|---|
0 | 103 | 0.857 | 0.117 |
1 | 7 | 0.885 | 0.000 |
2 | 87 | 0.833 | 0.080 |
3 | 78 | 0.900 | 0.128 |
4 | 92 | 1.000 | 0.098 |
5 | 6 | 0.877 | 0.000 |
6 | 11 | 0.923 | 0.000 |
7 | 97 | 0.917 | 0.041 |
8 | 11 | 0.923 | 0.000 |
9 | 8 | 0.909 | 0.000 |
doctor id | | avg treatment effect | | doctor skill | | num treated | | avg recovery rate |
---|---|---|---|---|
0 | 0.000 | 1 | 117 | 0.205 |
1 | -0.024 | 0 | 85 | 0.812 |
2 | 0.000 | 1 | 93 | 0.129 |
3 | 0.023 | 1 | 88 | 0.216 |
4 | -0.020 | 1 | 98 | 0.153 |
5 | -0.080 | 0 | 112 | 0.830 |
6 | 0.034 | 0 | 89 | 0.809 |
7 | -0.064 | 1 | 109 | 0.138 |
8 | 0.010 | 0 | 102 | 0.824 |
9 | 0.037 | 0 | 107 | 0.841 |
Propensity Score Reweighting
Another strategy for causal analysis is to account for the assignment mechanism using propensity scoring. A propensity score is defined as the coarsest function e that satisfies: Y_n( . ) conditionally independent of Z_{n .} given e( \theta_n ). Coarseness means the size of the image of the function, and is useful when considering high dimensional covariates. In this example, we'd like e( \theta_n ) to be the probability of being assigned a highly skilled doctor. But we don't know who is highly skilled in the first place (that's the unknown \beta). So we have to make do with e( \theta_n ) being a vector of length D representing the probability of being assigned each doctor. Here is the propensity score, dependent on \theta_n (split into two tables to fit on the page):unit covariate | | doc_0 | | doc_1 | | doc_2 | | doc_3 | | doc_4 |
---|---|---|---|---|---|
non-severe | 0.028 | 0.156 | 0.012 | 0.020 | 0.012 |
severe | 0.206 | 0.014 | 0.174 | 0.156 | 0.184 |
unit covariate | | doc_5 | | doc_6 | | doc_7 | | doc_8 | | doc_9 |
---|---|---|---|---|---|
non-severe | 0.212 | 0.156 | 0.024 | 0.182 | 0.198 |
severe | 0.012 | 0.022 | 0.194 | 0.022 | 0.016 |
doctor id | | doctor skill | | treated recovery | | untreated recovery | | avg treatment effect |
---|---|---|---|---|
0 | 1 | 0.487 | 0.489 | -0.003 |
1 | 0 | 0.442 | 0.495 | -0.053 |
2 | 1 | 0.457 | 0.494 | -0.037 |
3 | 1 | 0.514 | 0.489 | 0.025 |
4 | 1 | 0.549 | 0.491 | 0.058 |
5 | 0 | 0.439 | 0.497 | -0.058 |
6 | 0 | 0.462 | 0.492 | -0.030 |
7 | 1 | 0.479 | 0.498 | -0.019 |
8 | 0 | 0.462 | 0.492 | -0.030 |
9 | 0 | 0.455 | 0.493 | -0.038 |