• =====================Warren Sarle, 14 Sept 1995========ssc,ssm From: saswss@hotellng.unx.sas.com (Warren Sarle) Subject: Bootstrap Confidence Intervals I am writing documentation for some new bootstrap and jackknife software, and I want to provide some practical, nontechnical advice. especially on the use of bootstrap confidence intervals. The literature on this subject is rather confusing, even the basic terminology. Occasional posts in the sci.stat.* groups asking bootstrap questions suggest that this material might be of broad interest. There are many unresolved issues of both theoretical and practical natures, hence the cross-post to sci.stat.math and sci.stat.consult. I am not an expert on bootstrapping, and I would appreciate suggestions and corrections from those who are. Little of this post is specific to SAS software. There are a few places where I mention the %BOOT macro, which computes bootstrap bias and standard error estimates and associated confidence intervals assuming a normal sampling distribution, and the %BOOTCI macro, which computes some other bootstrap confidence intervals. The %JACK macro performs the basic leave-one-out jackknife. I will send a copy of the macros to anyone daring enough to do some alpha testing and to write some simple macros (an oxymoron?) of their own. Bootstrap Confidence Intervals Copyright (C) 1995 by Warren S. Sarle, Cary, NC, USA Confidence Intervals -------------------- The normal (standard) bootstrap confidence interval is accurate only for statistics with an approximately normal sampling distribution. The %BOOTCI macro provides the most commonly used types of bootstrap confidence intervals that: * are suitable for statistics with nonnormal sampling distributions and * require only a single level of resampling. It is advisable to specify at least 1000 resamples for a 90% confidence interval. For a higher level of confidence or for the BC and BCa methods, even more resamples should be used. The terminology for bootstrap confidence intervals is confused. The keywords used with the %BOOTCI macro follow S&T: Keyword Terms from the references ------- ------------------------- PCTL or "bootstrap percentile" in S&T; PERCENTILE "percentile" in E&T; "other percentile" in Hall; "Efron's `backwards' pecentile" in Hjorth HYBRID "hybrid" in S&T; no term in E&T; "percentile" in Hall; "simple" in Hjorth T "bootstrap-t" in S&T and E&T; "percentile-t" in Hall; "studentized" in Hjorth BC "BC" in all BCA "BCa" in S&T, E&T, and Hjorth; "ABC" in Hall (for simple random samples and certain other special cases only) There is considerable controversy concerning the use of bootstrap confidence intervals. To fully appreciate the issues, it is important to read S&T and Hall in addition to E&T. Asymptotically in simple random samples, the T and BCa methods work better than the traditional normal approximation, while the percentile, hybrid, and BC methods have the same accuracy as the traditional normal approximation. In small samples, things get much more complicated: * The percentile method simply uses the alpha/2 and 1-alpha/2 percentiles of the bootstrap distribution to define the interval. This method performs well for quantiles and for statistics that are unbiased and have a symmetric sampling distribution. For a statistic that is biased, the percentile method amplifies the bias. The main virtue of the percentile method and the closely related BC and BCa methods is that the intervals are equivariant under transformation of the parameters. One consequence of this equivariance is that the interval cannot extend beyond the possible range of values of the statistic. In some cases, however, this property can be a vice--see the "Cautionary Example" below. * The BC method corrects the percentile interval for bias--median bias, not mean bias. The correction is performed by adjusting the percentile points to values other than alpha/2 and 1-alpha/2. If a large correction is required, one of the percentile points will be very small; hence a very large number of resamples will be required to approximate the interval accurately. See the "Cautionary Example" below. * The BCa method corrects the percentile interval for bias and skewness. This method requires an estimate of the acceleration, which is related to the skewness of the sampling distribution. The acceleration can be estimated by jackknifing for simple random samples which, of course, requires extra computation. For bootstrapping residuals in regression models, no general method for estimating the acceleration is known. If the acceleration is not estimated accurately, the BCa interval will perform poorly. The length of the BCa interval is not monotonic with respect to alpha (Hall, pp 134-135, 137). For large values of the acceleration and large alpha, the BCa interval is excessively short. The BCa interval is no better than the BC interval for nonsmooth statistics such as the median. * The HYBRID method is the reverse of the percentile method. While the percentile method amplifies bias, the HYBRID method automatically adjusts for bias and skewness. The HYBRID method works well if the standard error of the statistic does not depend on any unknown parameters; otherwise, the T method works better if a good estimate of the standard error is available. Of all the methods in %BOOTCI, the HYBRID method seems to be the least likely to yield spectacularly wrong results. The HYBRID method and the closely related T method are not equivariant under transformation of the parameters. * The T method requires an estimate of the standard error (or a constant multiple thereof) of each statistic being bootstrapped. This requires more work from the user. If the standard errors are not estimated accurately, the T method may perform poorly. In simulation studies, T intervals are often found to be excessively long. E&T (p 160) claim that the T method is erratic and sensitive to outliers. Numerous other methods exist for bootstrap confidence intervals that require nested resampling, i.e., each resample of the original sample is itself reresampled multiple times. Since the total number of reresamples required is typically 25,000 or more, these methods are extremely expensive and have not yet been implemented in the %BOOT and %BOOTCI macros. A Cautionary Example -------------------- Jackknifing and bootstrapping are no remedy for an inadequate sample size. For nonparametric resampling methods, the sample distribution must be reasonably close in some sense to the population distribution to obtain accurate inferences. In parametric methods, only the estimated parameters need be reasonably close to the population parameters to obtain accurate inferences. The smaller the sample size, the greater the fluctuations in the distribution of the sample. Nonparametric methods that are sensitive to a wide variety of such fluctuations will suffer more from small sample sizes than will parametric methods _if_ the assumptions of the parametric methods are valid. In this example, the purpose of the analysis is to find a 95% confidence interval for R**2 in a linear regression with 20 observations and 10 predictors. The predictors and response are generated from a multivariate normal distribution, so normal-theory methods are applicable. With real data, if the distribution were not known to be normal, you might be tempted to use the jackknife or bootstrap on the theory that normal approximations could not be trusted in such a small sample size. In fact, most of the jackknife and bootstrap methods cannot be trusted either. This example computes a 95% confidence interval with each of the methods available in %JACK and %BOOT using 1000 resamples. The results are assembled into a single data set called CI for comparison. PROC CANCORR is also used to obtain a normal-theory 95% confidence interval. Two versions of the %ANALYZE macro are shown, one with CANCORR and one with REG; either version can be used for the analysis. title 'A Cautionary Example'; %let n=20; %let p=10; ** generate multivariate normal data with true R**2=0.1; data x; array x x1-x&p; do n=1 to &n; drop n; do over x; x=rannor(123); end; p=sum(of x1-x&p)/sqrt(&p); e=rannor(123); y=p*sqrt(.1)+e*sqrt(.9); output; end; run; [lost of SAS code deleted} The actual sampling distribution of R**2, based on 10000 simulated data sets, looks like this: Frequency 1000 + * | * * * | * * * * | * * * * * 800 + * * * * * * | * * * * * * * | * * * * * * * * | * * * * * * * * 600 + * * * * * * * * * | * * * * * * * * * * | * * * * * * * * * * | * * * * * * * * * * * 400 + * * * * * * * * * * * * | * * * * * * * * * * * * | * * * * * * * * * * * * * | * * * * * * * * * * * * * 200 + * * * * * * * * * * * * * * * | * * * * * * * * * * * * * * * * | * * * * * * * * * * * * * * * * * | * * * * * * * * * * * * * * * * * * * * ------------------------------------------------------ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0 1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 The bootstrap distribution computed from the one data set in this example is not even close to the true sampling distribution: Frequency | * 300 + * | * | * | * | * 200 + * | * | * | * * | * * * 100 + * * * * * | * * * * * * | * * * * * * * | * * * * * * * * * | * * * * * * * * * * * ------------------------------------------------------ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0 1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 The table of confidence intervals computed from this data set are as follows: Lower Upper Confidence Confidence Method Limit Limit Normal theory 0.00000 0.62876 Jackknife -0.44648 0.54393 Bootstrap Normal 0.07400 0.51324 Bootstrap Hybrid 0.18391 0.56566 Bootstrap PCTL 0.61824 1.00000 Bootstrap BC 0.51547 0.57231 Bootstrap BCa 0.51547 0.57231 Bootstrap t -3.11368 0.56556 The true value of R**2 in this example is 0.10, the sample plug-in estimate is 0.59, and the adjusted estimate is 0.14. The normal-theory interval can be considered the "right answer". The jackknife interval has a negative lower limit, and the upper limit is rather low, but the interval covers the true value. The bootstrap interval based on a normal approximation is short but does cover the true value. However, a glance at the chart of the bootstrap distribution shows that a normal approximation is suspect. The bootstrap hybrid interval is even shorter and does not cover the true value. The hybrid interval is poor because the bootstrap distribution is less variable and far more skewed than the true sampling distribution. The plug-in estimate is very biased, so it is no surprise that the bootstrap PCTL method works poorly. However, the PCTL interval lies entirely above the plug-in estimate, a dramatic illustration of Hall's claim that the PCTL interval is "backwards"! The bootstrap BC interval is extremely short and is not even close to the true value. The lower percentile point for computing the BC interval is .00000000010453, so billions of resamples would be required for an accurate approximation. The lower percentile point for the BCa interval is even smaller at 7.3099E-17, and would require an astronomical number of resamples for an accurate approximation. The bootstrap t interval has a wildy negative lower limit, and the upper limit is rather low, but the interval covers the true value. A simulation was performed by repeating the above analysis 650 times for randomly generated data sets. For each method, the coverage probability (COVERAGE), the average length (LENGTH), and the positive part of the length (POSLEN=AUCL-MAX(0,ALCL)) were computed. Among the jackknife and bootstrap methods, the only acceptable coverage probability is for the bootstrap t interval, which is nevertheless very poor with regard to the length of the interval. Considering only the positive part of the interval, the bootstrap t interval is quite good, but it works well in this example only because we know a lower bound for the parameter and have an analytic expression for the standard error. METHOD COVERAGE LENGTH POSLEN ---------------------------------------------------- Bootstrap BC 0.00462 0.148968 0.148968 Bootstrap BCa 0.00462 0.148733 0.148733 Bootstrap Hybrid 0.43320 0.417957 0.356943 Bootstrap Normal 0.58372 0.484523 0.366837 Bootstrap PCTL 0 0.417957 0.417957 Bootstrap t 0.96160 4.333339 0.542594 Jackknife 0.66206 0.563169 0.334087 Normal theory 0.96006 0.576008 0.576008 References ---------- E&T Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, New York: Chapman & Hall. Hall Hall, P. (1992), The Bootstrap and Edgeworth Expansion, New York: Springer-Verlag. Hjorth Hjorth, J.S.U. (1994), Computer Intensive Statistical Methods, London: Chapman & Hall. S&T Shao, J. and Tu, D. (1995), The Jackknife and Bootstrap, New York: Springer-Verlag. -- Warren S. Sarle SAS Institute Inc. The opinions expressed here saswss@unx.sas.com SAS Campus Drive are mine and not necessarily (919) 677-8000 Cary, NC 27513, USA those of SAS Institute. * * * * * * * * * * * * * * * * * * * * * * *
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html