My Research

I am studying causal inference, specifically looking at its applications to epidemiology, and even more specifically to the realm of optimal dynamic treatment regimes.

What is Causal Inference?

As the above strip by everyone's favourite NASA-roboticist-turned-webcomic-author indicates, modern statistical methods are very good at determining the existence of correlations (associations) between random variables, but are generally somewhat less good at inferring whether or not these associations are in fact causal relationships, a much stronger condition.

One way of distinguishing a causal relationship over a correlation, is by considering the effect of an intervention. Loosely speaking, if A has a causal effect on B, then if we intervene in A (i.e. forcibly alter its value), we will observe a corresponding change in B. If B remains unchanged, then there is no causal effect. This is the principle behind randomised controlled trials. The intervention is applied to the subjects in the "treatment" group, then their outcomes are compared to those of the "control" group. A discrepancy is then evidence for a causal link between the treatment and outcome.

This is great, but often these intervention studies are unfeasible. Sometimes they're too costly, sometimes they're unethical (for example, forcing people to smoke in order to determine whether there's a causal link between smoking and lung cancer), and sometimes it's just impossible to intervene in the "treatment" variable, such as when investigating the causal effect of gender on heart disease.

Instead, we may have ready access to merely observational data. For example, data on the smoking habits of all patients diagnosed with lung cancer in the UK in a particular year. On the face of it, the most we can glean from this kind of data is a correlation between smoking and lung cancer.

However, using some clever statistical methods, we can in fact infer causal relationships from observational data, as though it had actually come from an intervention study. These clever statistical methods are the main focus of causal inference.

What are Optimal Dynamic Treatment Regimes?

A treatment regime is a sequence of treatment decisions applied to a single subject over several time points. For example, the sequence of combinations of drugs prescribed to an HIV+ patient, as they visit their doctor for regular check-ups.

A dynamic treatment regime is one where the treatment decisions depend on the subject's history. The doctor in our HIV example will realistically prescribe drugs based on the patient's health at the time of the check-up, as well as on past dosages.

We are interested in optimising these decisions in order to maximise the value of some outcome. In the HIV example, the outcome could be time until progression to AIDS, or survival time.

In trying to optimise over some set of feasible dynamic treatment regimes, we need to have knowledge of the true causal effects of the possible treatment decisions. This is where causal inference comes in.