Data Collection

The data which are intended for capture-recapture analysis should be able to provide the overlap information among the sources. The traditional epidemiologic data collection, when involving more than one data source, ignores the duplicates from various sources. When there were same cases from more than one source, usually the duplicates were removed from the database to create aggregate number of cases. However, this overlap information from the sources is the key component of the capture-recapture analysis. In the data collection process, all the duplicate cases should be kept in the data set with a variable indicating the data source and the investigators need to keep track of the intersection of ascertainment sources. In addition, the data should be collected in a way so that enough information would be available to link or match the same individuals from different sources. In another word, the data set should contain the variables such as name, age or address which can be used to perform matching. In many countries because of confidentiality laws it may not be possible to obtain names. However, adequate matching often can be done on other variables such as "a 39 year old white male, suffering a spinal cord injury, at 1:00 AM in Willionsville, New York on May 29, 1994, on Main and Ellicout Street." Clearly without a name there is sufficient information for matching.