Capture-recapture methods have a long history, and they were first applied in the study of fish and wildlife populations before being adapted for other purposes. The application of these methods to the study of epidemiologic problems came relatively late in this history and thus has been able to draw on advances in the other areas as well as in statistical methods more broadly. The simplest capture-recapture model is the so-called two-sample model, used solely to estimate the unknown size of a population. The first sample provides the individuals for marking or tagging and is returned to the population, while the second sample provides the recaptures. Using the numbers of individuals caught in both samples (the recaptures) and the numbers caught in just one sample, it is possible to estimate the number not caught in either sample thus providing an estimate of the total population size. The assumptions required for this estimate to be valid can be spelt out in a number of ways. However the key ingredients are: (i) There is no change to the population during the investigation (the population is closed). (ii) There is no loss of tags (individuals can be matched from capture to recapture). (iii) For each sample, each individual has the same chance of being in the sample. (iv) The two samples are independent. Assumption (iv) really follows from (iii) since the latter implies that marked and unmarked have the same probability of being caught in the second sample so that capture in the first sample does not affect capture in the second sample: samples are independent. However it is convenient to list (iv) separately.

In ecology, the method is generally called the Petersen method because of Petersen's work in
1894 associated with tagged fish, though its first use in fisheries was by Dahl in 1917. It was also
used by Lincoln in 1930 to estimate the size of a duck population [Le Cren(1)]. Sekar and Deming (2) used the method
to estimate birth and death rates, and the extent of registration in 1949. Their paper may be
regarded as the first serious application of the capture-recapture method to human health and has
a good discussion on some of the practical problems associated with the method. Using a similar
approach, Shapiro (3) applied the technique to birth registration in the
USA using census data. There is also a substantial literature going back to the 1940's (Tracey (4)), under the title of dual record systems or dual-system estimator,
dealing with the application of the two sample method to census data. By taking another sample
in addition to the census, the capture-recapture method can be used for estimating undercount by
the census. The method and controversy currently surrounding its application to the US census
are described by Hogan (5). A helpful bibliography of the literature
relating to this problem is given by Fienberg (6).

The above method can, in principle, be applied to any situation where there are two incomplete
lists. One simply replaces "being caught in sample i" by "being on list i". This is the case in
epidemiology where lists can be constructed from a variety of sources such as hospital records,
doctors' medical files, medical prescriptions and so on. By their very nature these lists are
incomplete and the problem is to estimate those missing from both lists. In spite of the above
early work, such applications to epidemiology came later with Wittes and her colleagues (7,8) pointing out the connections.

With regard to applying the assumptions to epidemiology, the experiment can generally be set up
so that (i) is at least approximately true. For assumption (ii), matching will depend on the quality
of the patients' records and the uniqueness of the patients' code names. In some parts of the
world matching is a real problem. Unfortunately assumption (iii), that each individual has the
same probability of being on a given list, is generally false, that is patients tend to be
heterogeneous with regard to being "caught" on a list. Some methods for minimizing
heterogeneity are described later. However, even if something could be done about this,
assumption (iv) is invariably false. For example, if certain doctors refer their patients to certain
hospitals, then hospital admissions and doctors' records will not give two independent lists. This
question of dependence is discussed in detail by Sekar and Deming (2)
and Wolter (9). One can think of decomposing assumption (iii) into
two parts -- dependence and heterogeneity of capture probabilities. For human populations, the
latter component has been considered only recently (10,11) although
those working in ecology and other areas had done so earlier.

In animal population studies, the 2-sample method was extended to the K-sample method. By
taking more than two samples one can utilize the information from the multiple recaptures. The
unmarked animals in each sample are now given individual marks before being returned to the
population. If one uses individual (e.g. numbered) marks then the capture history of each marked
individual is known.

The first person to introduce the K-sample capture-recapture method was Schnabel in 1938(12), in the context of fishing in a lake. She made the usual assumptions about the sampling and the marking processes such as each sample is a simple random sample and animals do not lose their tags. The theory of this model was developed more fully by Chapman, Darroch and others in the 1950's (13, chapter 4). However it was recognized that some of the underlying assumptions may not hold. For example there was the problem of heterogeneity - unmarked animals had different probabilities of being captured in a given sample, and marked animals behaved differently from unmarked. To cater for populations with these problems, a range of different models was introduced in the 1970's and these are associated with the names of Anderson, Burnham, Otis, White and others (see the review by Seber (14), p.275). These models have since been added to by Chao so that a hierarchy of eight models is now available (see the reviews by Pollock (15); Seber(16, pp.141-3)).

The K-sample method had also been applied to populations that allow migration, birth, and death to take place during the period of the study (the open population). There is a very extensive and expanding literature on the subject (17,18). However, such models depend on the assumption that samples are independent. As this is not the case with lists, it is unlikely that these general models will be directly useful in epidemiology.

Another method for handling the breakdown of the assumptions is the log-linear model which was
applied by Fienberg (19) to capture-recapture data. In fact, a general
log-linear framework allows for the representation and incorporation of most of these models for
K lists, as well as some extensions for the generalization from closed to open populations (20).

Clearly the above methodology has the potential for being applied to K lists. Unfortunately we
run into the same problem again, namely that of list dependence. Current thinking would suggest
that of all the above approaches only the log-linear model has the flexibility for handling this
particular problem. However, such a model has to be used with caution as one still needs some
assumptions to hold for the model to be useful (see Appendix for detail).

2. Sekar C and Deming EW. On a method of estimating birth and death rates and extent of
registration. *Journal of the American Statistical Association* 1949;44:101-115.

3. Shapiro S. Estimating birth registration completeness. *J Amer Stat Assoc*
1949;45:261-264.

4. Tracy WR. *Fertility of the population of Canada*. Reprinted from Seventh Census of
Canada, 1931, (Vol 2), Census Monograph No. 3. Ottawa:Cloutier.

5. Hogan H. The 1990 post-enumeration survey: operations and results. * J Amer Stat
Assoc* 1993;88:1047-1060.

6. Fienberg SE. Bibliography on capture-recapture modeling with application to census
undercount adjustment. *Survey Methodology* 1992;18:143-154.

7. Wittes J and Sidel VW. A generalization of the simple capture-recapture model with
applications to epidemiological research. *J Chronic Dis* 1968;21:287-301.

9. Wolter KM. Some coverage error models for census data. *J Am Stat Assoc*
1986;81:338-46.

10. Hook EB, Regal RR. Effect of variation in probability of ascertainment by sources ("variable
catchability") upon "capture-recapture" estimates of prevalence. *Am J Epidemiol*
1993;137:1148-66.

12. Schnabel ZE. The estimation of the total fish population of a lake. *Amer Math Mon*
1938;45:348-52.

13. Seber GAF. *The estimation of animal abundance and related parameters*, 2nd edit.
London:Griffin 1982.

14. Seber GAF. A review of estimating animal abundance. *Biometrics* 1986;42:267-292.

15. Pollock KH. Modeling capture, recapture and removal statistics for estimation of
demographic parameters for fish and wildlife populations: past, present and future. *J. Amer
Stat Assoc* 1991;86:225-238.

16. Seber GAF. A review of estimating animal abundance II. *International Statistical
Review* 1992;60:129-166.

17. Pollock KH. Modeling capture, recapture and removal statistics for estimation of
demographic parameters for fish and wildlife populations: past, present and future. *J Am Stat
Assoc* 1991;86:225-38.

18. Seber GAF. A review of estimating animal abundance II. *Int Stat Rev>*
1992;60:129-66.

19. Fienberg SE. The multiple recapture census for closed populations and incomplete 2k
contingency tables. *Biometrika* 1972;59:591-603.

20. Cormack RM. Log-linear models for capture-recapture experiments on open populations. In:
Hiorns RW, Cooke D, eds., *The mathematical theory of the dynamics of biological
populations II*. London: Academic Press, 1981.