=====================Warren Sarle, 14 Sept 1995========ssc,ssm
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Bootstrap Confidence Intervals
I am writing documentation for some new bootstrap and jackknife
software, and I want to provide some practical, nontechnical advice.
especially on the use of bootstrap confidence intervals. The literature
on this subject is rather confusing, even the basic terminology.
Occasional posts in the sci.stat.* groups asking bootstrap questions
suggest that this material might be of broad interest. There are many
unresolved issues of both theoretical and practical natures, hence the
cross-post to sci.stat.math and sci.stat.consult.
I am not an expert on bootstrapping, and I would appreciate suggestions
and corrections from those who are.
Little of this post is specific to SAS software. There are a few places
where I mention the %BOOT macro, which computes bootstrap bias and
standard error estimates and associated confidence intervals assuming a
normal sampling distribution, and the %BOOTCI macro, which computes some
other bootstrap confidence intervals. The %JACK macro performs the basic
leave-one-out jackknife. I will send a copy of the macros to anyone
daring enough to do some alpha testing and to write some simple macros
(an oxymoron?) of their own.
Bootstrap Confidence Intervals
Copyright (C) 1995 by Warren S. Sarle, Cary, NC, USA
Confidence Intervals
--------------------
The normal (standard) bootstrap confidence interval is accurate only for
statistics with an approximately normal sampling distribution. The
%BOOTCI macro provides the most commonly used types of bootstrap
confidence intervals that:
* are suitable for statistics with nonnormal sampling distributions and
* require only a single level of resampling.
It is advisable to specify at least 1000 resamples for a 90% confidence
interval. For a higher level of confidence or for the BC and BCa
methods, even more resamples should be used.
The terminology for bootstrap confidence intervals is confused. The
keywords used with the %BOOTCI macro follow S&T:
Keyword Terms from the references
------- -------------------------
PCTL or "bootstrap percentile" in S&T;
PERCENTILE "percentile" in E&T;
"other percentile" in Hall;
"Efron's `backwards' pecentile" in Hjorth
HYBRID "hybrid" in S&T;
no term in E&T;
"percentile" in Hall;
"simple" in Hjorth
T "bootstrap-t" in S&T and E&T;
"percentile-t" in Hall;
"studentized" in Hjorth
BC "BC" in all
BCA "BCa" in S&T, E&T, and Hjorth; "ABC" in Hall
(for simple random samples and certain other
special cases only)
There is considerable controversy concerning the use of bootstrap
confidence intervals. To fully appreciate the issues, it is important to
read S&T and Hall in addition to E&T. Asymptotically in simple random
samples, the T and BCa methods work better than the traditional normal
approximation, while the percentile, hybrid, and BC methods have the
same accuracy as the traditional normal approximation. In small
samples, things get much more complicated:
* The percentile method simply uses the alpha/2 and 1-alpha/2
percentiles of the bootstrap distribution to define the interval.
This method performs well for quantiles and for statistics that are
unbiased and have a symmetric sampling distribution. For a
statistic that is biased, the percentile method amplifies the bias.
The main virtue of the percentile method and the closely related BC
and BCa methods is that the intervals are equivariant under
transformation of the parameters. One consequence of this
equivariance is that the interval cannot extend beyond the possible
range of values of the statistic. In some cases, however, this
property can be a vice--see the "Cautionary Example" below.
* The BC method corrects the percentile interval for bias--median
bias, not mean bias. The correction is performed by adjusting the
percentile points to values other than alpha/2 and 1-alpha/2. If a
large correction is required, one of the percentile points will be
very small; hence a very large number of resamples will be required
to approximate the interval accurately. See the "Cautionary
Example" below.
* The BCa method corrects the percentile interval for bias and
skewness. This method requires an estimate of the acceleration,
which is related to the skewness of the sampling distribution. The
acceleration can be estimated by jackknifing for simple random
samples which, of course, requires extra computation. For
bootstrapping residuals in regression models, no general method for
estimating the acceleration is known. If the acceleration is not
estimated accurately, the BCa interval will perform poorly. The
length of the BCa interval is not monotonic with respect to alpha
(Hall, pp 134-135, 137). For large values of the acceleration and
large alpha, the BCa interval is excessively short. The BCa
interval is no better than the BC interval for nonsmooth statistics
such as the median.
* The HYBRID method is the reverse of the percentile method. While
the percentile method amplifies bias, the HYBRID method
automatically adjusts for bias and skewness. The HYBRID method
works well if the standard error of the statistic does not depend
on any unknown parameters; otherwise, the T method works better if
a good estimate of the standard error is available. Of all the
methods in %BOOTCI, the HYBRID method seems to be the least likely
to yield spectacularly wrong results. The HYBRID method and the
closely related T method are not equivariant under transformation
of the parameters.
* The T method requires an estimate of the standard error (or a
constant multiple thereof) of each statistic being bootstrapped.
This requires more work from the user. If the standard errors are
not estimated accurately, the T method may perform poorly. In
simulation studies, T intervals are often found to be excessively
long. E&T (p 160) claim that the T method is erratic and sensitive
to outliers.
Numerous other methods exist for bootstrap confidence intervals that
require nested resampling, i.e., each resample of the original
sample is itself reresampled multiple times. Since the total number
of reresamples required is typically 25,000 or more, these methods
are extremely expensive and have not yet been implemented in the
%BOOT and %BOOTCI macros.
A Cautionary Example
--------------------
Jackknifing and bootstrapping are no remedy for an inadequate sample
size. For nonparametric resampling methods, the sample distribution must
be reasonably close in some sense to the population distribution to
obtain accurate inferences. In parametric methods, only the estimated
parameters need be reasonably close to the population parameters to
obtain accurate inferences. The smaller the sample size, the greater the
fluctuations in the distribution of the sample. Nonparametric methods
that are sensitive to a wide variety of such fluctuations will suffer
more from small sample sizes than will parametric methods _if_ the
assumptions of the parametric methods are valid.
In this example, the purpose of the analysis is to find a 95% confidence
interval for R**2 in a linear regression with 20 observations and 10
predictors. The predictors and response are generated from a
multivariate normal distribution, so normal-theory methods are
applicable. With real data, if the distribution were not known to be
normal, you might be tempted to use the jackknife or bootstrap on the
theory that normal approximations could not be trusted in such a small
sample size. In fact, most of the jackknife and bootstrap methods cannot
be trusted either.
This example computes a 95% confidence interval with each of the methods
available in %JACK and %BOOT using 1000 resamples. The results are
assembled into a single data set called CI for comparison. PROC CANCORR
is also used to obtain a normal-theory 95% confidence interval. Two
versions of the %ANALYZE macro are shown, one with CANCORR and one with
REG; either version can be used for the analysis.
title 'A Cautionary Example';
%let n=20;
%let p=10;
** generate multivariate normal data with true R**2=0.1;
data x; array x x1-x&p;
do n=1 to &n; drop n;
do over x; x=rannor(123); end;
p=sum(of x1-x&p)/sqrt(&p);
e=rannor(123);
y=p*sqrt(.1)+e*sqrt(.9);
output;
end;
run;
[lost of SAS code deleted}
The actual sampling distribution of R**2, based on 10000 simulated data
sets, looks like this:
Frequency
1000 + *
| * * *
| * * * *
| * * * * *
800 + * * * * * *
| * * * * * * *
| * * * * * * * *
| * * * * * * * *
600 + * * * * * * * * *
| * * * * * * * * * *
| * * * * * * * * * *
| * * * * * * * * * * *
400 + * * * * * * * * * * * *
| * * * * * * * * * * * *
| * * * * * * * * * * * * *
| * * * * * * * * * * * * *
200 + * * * * * * * * * * * * * * *
| * * * * * * * * * * * * * * * *
| * * * * * * * * * * * * * * * * *
| * * * * * * * * * * * * * * * * * * * *
------------------------------------------------------
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
. . . . . . . . . . . . . . . . . . . . . . . . . .
0 0 0 1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0
0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0
The bootstrap distribution computed from the one data set in this
example is not even close to the true sampling distribution:
Frequency
| *
300 + *
| *
| *
| *
| *
200 + *
| *
| *
| * *
| * * *
100 + * * * * *
| * * * * * *
| * * * * * * *
| * * * * * * * * *
| * * * * * * * * * * *
------------------------------------------------------
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
. . . . . . . . . . . . . . . . . . . . . . . . . .
0 0 0 1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0
0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0
The table of confidence intervals computed from this data set are
as follows:
Lower Upper
Confidence Confidence
Method Limit Limit
Normal theory 0.00000 0.62876
Jackknife -0.44648 0.54393
Bootstrap Normal 0.07400 0.51324
Bootstrap Hybrid 0.18391 0.56566
Bootstrap PCTL 0.61824 1.00000
Bootstrap BC 0.51547 0.57231
Bootstrap BCa 0.51547 0.57231
Bootstrap t -3.11368 0.56556
The true value of R**2 in this example is 0.10, the sample plug-in
estimate is 0.59, and the adjusted estimate is 0.14. The normal-theory
interval can be considered the "right answer".
The jackknife interval has a negative lower limit, and the upper limit
is rather low, but the interval covers the true value.
The bootstrap interval based on a normal approximation is short but does
cover the true value. However, a glance at the chart of the bootstrap
distribution shows that a normal approximation is suspect.
The bootstrap hybrid interval is even shorter and does not cover the
true value. The hybrid interval is poor because the bootstrap
distribution is less variable and far more skewed than the true sampling
distribution.
The plug-in estimate is very biased, so it is no surprise that the
bootstrap PCTL method works poorly. However, the PCTL interval lies
entirely above the plug-in estimate, a dramatic illustration of Hall's
claim that the PCTL interval is "backwards"!
The bootstrap BC interval is extremely short and is not even close to
the true value. The lower percentile point for computing the BC
interval is .00000000010453, so billions of resamples would be required
for an accurate approximation. The lower percentile point for the BCa
interval is even smaller at 7.3099E-17, and would require an
astronomical number of resamples for an accurate approximation.
The bootstrap t interval has a wildy negative lower limit, and the upper
limit is rather low, but the interval covers the true value.
A simulation was performed by repeating the above analysis 650 times for
randomly generated data sets. For each method, the coverage probability
(COVERAGE), the average length (LENGTH), and the positive part of the
length (POSLEN=AUCL-MAX(0,ALCL)) were computed. Among the jackknife and
bootstrap methods, the only acceptable coverage probability is for the
bootstrap t interval, which is nevertheless very poor with regard to the
length of the interval. Considering only the positive part of the
interval, the bootstrap t interval is quite good, but it works well in
this example only because we know a lower bound for the parameter and
have an analytic expression for the standard error.
METHOD COVERAGE LENGTH POSLEN
----------------------------------------------------
Bootstrap BC 0.00462 0.148968 0.148968
Bootstrap BCa 0.00462 0.148733 0.148733
Bootstrap Hybrid 0.43320 0.417957 0.356943
Bootstrap Normal 0.58372 0.484523 0.366837
Bootstrap PCTL 0 0.417957 0.417957
Bootstrap t 0.96160 4.333339 0.542594
Jackknife 0.66206 0.563169 0.334087
Normal theory 0.96006 0.576008 0.576008
References
----------
E&T Efron, B. and Tibshirani, R.J. (1993), An Introduction to the
Bootstrap, New York: Chapman & Hall.
Hall Hall, P. (1992), The Bootstrap and Edgeworth Expansion, New York:
Springer-Verlag.
Hjorth Hjorth, J.S.U. (1994), Computer Intensive Statistical Methods,
London: Chapman & Hall.
S&T Shao, J. and Tu, D. (1995), The Jackknife and Bootstrap, New York:
Springer-Verlag.
--
Warren S. Sarle SAS Institute Inc. The opinions expressed here
saswss@unx.sas.com SAS Campus Drive are mine and not necessarily
(919) 677-8000 Cary, NC 27513, USA those of SAS Institute.
* * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html