
The statements provided by the users have a prefix "[i]", and the GLIM outputs have a prefix "[o]". The first procedure in the GLIM program is to specify the number of data points in the data set using "unit" statement. If there are k sources, there will be 2k data points. In the example there are 3 sources, and 23 = 8.
[i] ? $unit 8$
Next, data input. We use 'p1' as the first source (animal control), 'p2' as the second source (hospital), 'p3' as the third source (police/victim reports), and 'r' as the number of cases by the sources. The 'data' statement defines the variable name and the 'read' commend reads the data value into the specified variable. In the following, the first 'read' statement reads the values in the first column of Table 1 to the variable 'p1', the second reads the second column of Table 1, etc. Since the last element of the variable 'r' (= number of cases observed by neither of the sources) is unobservable, you may use any number as the input value, here '0' is used.
[i] ? $data p1$read 0 1 0 1 0 1 0 1$
[i] ? $data p2$read 0 0 1 1 0 0 1 1$
[i] ? $data p3$read 0 0 0 0 1 1 1 1$
[i] ? $data r$read 1 7 15 326 27 323 91 0$
Because of unobservable, the last data point will not be used in the model fitting and a weight variable will be used to specify which data point(s) won't be included in the analysis. The following 'data' and 'read' statements read in 'w' which will be used as a weight variable in the analysis. If the weight value is '0', the correspondent data point will not be included. In this example, the last data point will be excluded from the analysis.
[i] ? $data w$read 1 1 1 1 1 1 1 0$
Next step, specify the response or y variable, in the example, it is 'r'. Also define the error distribution as Possion, and the weight variable as 'w'.
[i] ? $yvar r$err p$wei w$
Now we may start to fit different models to the dataset by using the 'fit' commend. First, the independent model, that is the model with three main effects only and without any interaction terms. The deviance for this independent model is 49.22 with 3 degrees of freedom (p-value < 0.05) and it is not a good fit, more interaction terms should be introduced.
[i] ? $fit p1+p2+p3$
[o] scaled deviance = 49.220 at cycle 4
[o] residual df = 3 from 7 observations
Since the independent model is not fitting well, a saturated model will be fit by including all possible interaction terms (except 3-way interaction). Then use backward elimination to remove those un-significant interaction terms. The term 'p1.p2' is the interaction of 'p1' and 'p2'.
[i] ? $fit +p1.p2+p1.p3+p2.p3$
[o] scaled deviance = 2.331e-15(change =-49.22)at cycle 2
[o] residual df = 0 (change = -3 ) from 7 observations
Backward elimination. First remove the interaction of the first and second sources. The change of deviance when 'p1.p2' is removed is 0.82 with 1 degree of freedom. This indicates that 'p1.p2' is not significant and will be removed from the model.
[i] ? $fit -p1.p2$
[o] scaled deviance = 0.82089 (change =+0.8209) at cycle 3
[o] residual df = 1(change = +1 ) from 7 observations
Next, remove the interaction of the first and third sources, and again, it is not significant.
[i] ? $fit -p1.p3$
[o] scaled deviance = 3.8174 (change = +2.996) at cycle 3
[o] residual df = 2 (change = +1 ) from 7 observations
In a 3-source model there are only three 2-way interaction terms. The independent model is not fitting well and 'p1.p2' and 'p1.p3' are not significant, this suggests that the interaction of the second and third sources ('p2.p3') must be significant. Therefore, the final model will be p1+p2+p3+p2.p3. Look at the estimates of the parameters for the selected model. The commend 'd' is for display, and 'e' represents estimates.
[i] ? $d e$
[o] estimate s.e. parameter
[o] 1 -0.7090 0.3831 1
[o] 2 2.725 0.1574 P1
[o] 3 3.752 0.3576 P2
[o] 4 3.778 0.3575 P3
[o] 5 -2.311 0.4044 P2.P3
[o] scale parameter 1.000
Finally, check the estimated number of cases which were missed by all three sources. Since the last data point (the 8th one) corresponds to the number of cases observed by neither of the sources, the fitted value (or the estimated value) of that data point will give us the estimate. '%fv' is a system defined variable for fitted values and 'look 8' will view the eighth of the fitted value. Therefore, there were 1388 dog bite injuries missed by all three reporting sources in Pittsburgh.
[i] ? $look 8 %fv$
[o] %FV
[o] 8 1388.
Note, if '0' is used to indicate 'identified' and '1' indicate 'not identified' in the data set, the estimated number of missing cases can also be calculated by exp(-0.709+2.725+3.752+3.788-2.311), that is the exponential of the summation of all the parameter estimates.