Research
Group Testing
This is a thing I wrote a while ago for a research proposal.
You are in charge of a large army. Suppose you wish to screen your soldiers for a rare disease by taking samples of their blood. One approach would be to test each soldier's blood individually; however, conducting so many tests would be costly and time-consuming, and almost all the tests would come back negative. Instead, a better strategy would be to pool blood samples together and test these pooled samples - a testing pool consisting of entirely disease-free blood would yield a negative result, while a testing pool containing any diseased blood would yield a positive result. (The test is usually assumed to be perfectly reliable - an assumption I hope to relax.) After a relatively small number of these pooled tests, it should be possible to deduce which of the original blood samples carried the disease.
Determining effective ways to pool samples in order to detect with certainty which soldiers have the disease (or are defective) is called group testing. Group testing is traditionally thought of as a combinatorial problem, the task being to find a combination of testing pools that ensures accurate reconstruction of the defective set.
While the theory of group testing has obvious applications in pathology, it is also applicable to DNA sequencing in genomics, to spectrum manage-ment in engineering, and to population modelling in statistics, as well as having theoretical interest to combinatorialists and probabilists.
Recent work by Atia and Saligrama, and in Bristol by Sejdinovic and Johnson, has outlined a possible connection between group testing and the informa-tion theoretic problem of communication over a noisy channel. In information theory, we wish to deduce the message sent from a series of transmitted signals; similarly, in group testing, we wish to deduce the defective set from a series of test outcomes.
The rich probabilistic theory developed around the problem of communication could lead to important insights into group testing, especially in areas where traditional combinatorial techniques have fallen short. Two areas show early promise: dealing with unreliable tests, and finding pooling schemes that are computationally feasible, rather than just theoretically possible. Noisy testing models assume that the tests do not always work perfectly, but that erroneous outcomes can occur (based on some probabilistic model), similar to the noisy communication channels of information theory. In fact, an information theoretic approach could potentially offer insight into a much wider class of testing models, such as testing for multiple diseases at once, allowing the use of feedback (past testing results influencing the make-up of future pools), and requiring only partial reconstruction of the defective set. Practical algorithms for group testing are obviously of vital importance for real-life applications, an issue that combinatorialists have rarely addressed. Modern information theoretic decoding algorithms, such as message passing and its variants, offer great promise here.
Exploration of the connection between group testing and information theory is at a very early stage. In particular, there is currently no group testing equivalent of Shannon's celebrated channel coding theorem, which gives the maximum possible rate of communication in terms of statistical characteristics of a channel model. A "Shannon's theorem of group testing" could determine how many tests are needed to recover the defective set in terms of statistical characteristics of test outcomes.
I developed an interest in group testing while studying for my PhD in information theory. However, due to funding limitations from an industrial sponsor, I was unable to spend as much time working in this area as I would have liked. A recent part-time research assistant post with Sejdinovic has given me the opportunity to become well-acquainted with the group testing literature, and I have started to spend some time working on this problem, with promising early results.
In particular, I have identified an important condition under which many information theoretic principles can be applied. Although I have been able to prove a "Shannon theorem" for some very simple models, a complete theory for the general case would be of great value in both group testing and information theory. Specifically, proving theorems where this condition fails to hold is vital to making the work more widely applicable. I hope that the Heilbronn fellowship programme at the University of Bristol will give me the time and freedom to make further breakthroughs.