• ====================John D'Errico, 24 Mar 1995/July 2001=======ssm From: John D'Errico <derrico@pixel.kodak.com> Subject: Gaussian FAQ Message-ID: <3kv7m3$61f@thetimes.pixel.kodak.com> [ RFU: JD'E provided an extended and corrected (one confusion of sigma/sigma^2) version on July 19, 2001. This replaces the original.] This is a FAQ which attempts to cover questions about the normal (Gaussian) distribution. Many thanks are due to those who have suggested enhancements. I will endeavor to keep this FAQ updated as I receive comments. Questions to be answered in this FAQ are: 0. Why is the normal (Gaussian) distribution important? 1. What is the normal density function? 2. What is the cumulative normal distribution function? 3. How is ERF related to the cumulative normal distribution function? 4. What is a good approximation to the cumulative normal function? 5. What is a good approximation to the inverse of the cumulative distribution normal function? 6. How do you generate a normally distributed random variable? 7. If X is a normally distributed random variable with mean = 0 standard deviation = 1, how do you convert it into one with mean = mu and standard deviation = sigma? 8. How do you generate bivariate or multivariate random normal variates with a specified correlation matrix? Covariance matrix? 9. If X is a normally distributed random variable, what is the distribution of f(X)? 10. I have data which appears to be a mixture of one or more Gaussian modes. How can I estimate the parameters of this function? 11. What are some good references on these topics? ====================================================================== Answers to questions 0-11: 0. Why is the normal (Gaussian) distribution important? The normal (or Gaussian) distribution is one which appears in an incredible variety of statistical applications. A good reason for this is the central limit theorem. This theorem tells us that sums of random variables will, under the appropriate conditions, tend to be approximately normally distributed. Even when the right conditions are not met however, the distributions found for many experimentally generated sets of data still tend to have a bell shaped curve that often looks quite like that of a normal. Even when a distribution may not be truly normal, it may still be convenient to assume that a normal distribution is a good approximation. In this case, we can describe the entire distribution by simply a mean and a variance, two easily computed parameters that all statisticians and experimentors understand and can use for comparison. The normal distribution has become a convenient standard for all to use. ---------------------------------------------------------------------- 1. What is the normal density function? The standard (mean=0 and variance=1) normal density function is 1 -(1/2)*x^2 Z(x) = ---------- e sqrt(2*pi) For mean=mu and variance=sigma^2 1 -(1/2)*((x-mu)/sigma)^2 Z(x) = ---------------- e sigma*sqrt(2*pi) The normal distribution with mean=mu and variance=sigma^2 is customarily referred to as N(mu,sigma^2). The standard normal is N(0,1). Note that some writers use a slighly different convention for the second parameter, whereas we assume this second parameter to be the variance (sigma^2), some use this notation to show the standard deviation (sigma) instead. ---------------------------------------------------------------------- 2. What is the cumulative normal distribution function? The cumulative normal, referred to here by P(X), is simply the integral of Z(x) dx from minus infinity to X. We will use Q(X) to refer to the integral of Z(x) dx from X to plus infinity. P(X) + Q(X) = 1 ---------------------------------------------------------------------- 3. How is ERF related to the cumulative normal distribution function? Since some systems do not provide a cumulative normal function but do provide ERF, this can be useful if you know the transformation between them. ERF(X) = integral [2/sqrt(pi)]*exp(-x^2) dx from 0 to X. P(X) = [ERF(X/sqrt(2)) + 1]/2 ---------------------------------------------------------------------- 4. What is a good approximation to the cumulative normal function? While it is not possible to write a solution to this in a simple, closed form expression not involving infinite sums, there are many good approximations available. It is not necessary to resort to numerical integration techniques such as trapezoidal or Simpson's rule. In fact, far more accuracy can be achieved through one of the approximations which can be found in the references listed below. One simple approximation to P(X), the cumulative normal, is: (from Abramowitz and Stegun) Let t = 1/(1+pX) where p=.33267 then P(x) = 1-Z(x)*[a_1*t + a_2*t^2 + a_3*t^3] where a_1 = 0.4361836 a_2 = -0.1201676 a_3 = 0.9372980 The error in this approximation is less than 0.00001 for all X. Other approximations can be found in Abramowitz & Stegun; Press, et al; and in Hart, et al. In particular, Hart has a very extensive set of approximations, some with accuracy over 20 significant digits within a given region. ---------------------------------------------------------------------- 5. What is a good approximation to the inverse of the cumulative normal distribution function? Mathematically we can write this as: Find X such that Q(X) = p for any 0 < p < 1. (Note: this computes an upper tail probability.) Again, it is not possible to write this as a closed form expression, so we resort to approximations. Because of the symmetry of the normal distribution, we need only consider 0 < p < 0.5. If you have p > 0.5, then apply the algorithm below to q = 1-p, and then negate the value for X obtained. (This approximation is also from Abramowitz and Stegun.) t = sqrt[ ln(1/p^2) ] c_0 + c_1*t + c_2*t^2 X = t - ------------------------------ 1 + d_1*t + d_2*t^2 + d_3*t^3 c_0 = 2.515517 c_1 = 0.802853 c_2 = 0.010328 d_1 = 1.432788 d_2 = 0.189269 d_3 = 0.001308 See Abramowitz and Stegun; Press, et al. ---------------------------------------------------------------------- 6. How do you generate a normally distributed random variable? There are several good ways to do this, some better than others. It is assumed that the user has available a good uniform pseudo- random deviate generator available. This is the starting point for all such approaches. A good and logical way to generate a normal random deviate is to look back to the answer to question (5). Simply generate a single uniform [0,1] random deviate X, and then invert the cumulative normal at that point. If the inverse is an accurate one, then that number will indeed be a pseudo-random normal deviate. This method will be as fast and as good as is the approximation you choose for the inverse of the cumulative normal. The next algorithm we will describe is often used when the exact normality of the deviates is not deemed important and the user just wants a quickly programmed, roughly normal deviate. This is accomplished by making use of the central limit theorem. This says that the sum of a set of independent, identically distributed (commonly abbreviated as iid) random numbers will approach normality as enough of them are added together. So, simply generate n uniform [0,1] random numbers, and sum them up. Then shift and scale this sum to have the appropriate mean and variance to give a roughly N(0,1) random variable. Thus, sum(X_i) - n/2 Z = -------------- sqrt(n/12) The most common choices of n are 3 and 12. n=3 requires only a bit shift instead of a divide to scale it Z = ( x_1 + x_2 + x_3 )*2 + 3 Of course, n=3 will result in a very poor approximation to a normal deviate. If we choose n=12, no scaling will be needed at all, and a histogram would indeed start to look very normal although the tails will still be inadequate for many users. Remember though, this method still requires the generation of 12 uniform deviates for every normal one it provides. The other common method, which generates pairs of normal deviates at a time (or if you want, bivariate normal deviates) is the Box- Muller transformation. See Numerical Recipes (Press, et. al.) for an explanation of this method. A flaw with the Box-Muller method is an interaction with Linear Congruential Generators (the most common way to obtain the uniform inputs) such that the pairs generated always fall on a spiral. See, for example, Ripley's "Stochastic Simulation" text, or "A Guide to Simulation" by Bratley, Fox, and Schrage. Because of this Box-Muller should probably *not* be recommended as a first choice. ---------------------------------------------------------------------- 7. If X is a N(0,1) distributed random variable, how do you convert it into N(mu,sigma^2)? Referring back to (1), we see that a simple transformation will accomplish this. Z = mu + sigma*X ---------------------------------------------------------------------- 8. How do you generate multivariate random normal variates with a specified correlation matrix and set of variances? Lets start with a simple case, just a bivariate normal. We will assume s1 and s2 are the standard deviations, c is the desired correlation between them (-1 <= c <= 1) and mu1 and mu2 are the desired means. First generate two unit normal random deviates, N1 and N2. X = mu1 + s1*N1 Y = mu2 + c*s2*N1 + sqrt(1-c^2)*s2*N2 X and Y will be bivariate random normal deviates with the desired distribution parameters. Now for the more general case of a multivariate normal where "multi" means larger than two. Suppose we have a correlation matrix C. We need to convert this to a covariance matrix, S, first. Let D be a diagonal matrix with the desired variable standard deviations on its diagonal. Then S = D C D Form the Cholesky factorization of S (actually any matrix "square root" of s will do.) This is a factorization of S in terms of a lower triangular matrix L such that S = L' L This factorization always exists whenever S is positive definite (non-singular in this case, since we started from a supposedly valid correlation matrix.) Generate an n by p array, A, of N(0,1) random deviates, where p is the number of variables and n is the number of multivariate sample you wish to end up with. Then B = A L' is an array of normally distributed random deviates where each row is an event with roughly the correlation/covariance matrix you wanted. (It won't be exactly that because of pseudo-random variability.) Once you have generated a set of random deviates with the desired correlation matrix and mean vector zero, simply add a vector offset to obtain any desired mean. If the correlation/covariance matrix supplied is singular, a similar approach can still be used, but now we must define L using a different type of matrix square root. This can be achieved using an eigenvalue decomposition of the matrix S. ---------------------------------------------------------------------- 9. If X is a normally distributed random variable, what is the distribution of f(X)? There are several simple cases we can cover. If X and Y are N(0,1) independent R.V.s, then aX + bY is a normal random variable for constants a and b. X/Y is a Cauchy random variable (and its moments are undefined.) e^X is a lognormal random variable. X^2 + Y^2 is a Chi-squared random variable with two degrees of freedom. sqrt(x^2 + y^2) is a Rayleigh random variable. These distributions along with many others can all be found in Johnson, et al. The general problem of determining the distribution of a function of one or more random variables is known as statistical tolerancing. This can be done by several techniques, Monte Carlo methods, first or second order Taylor series approximations, and Taguchi style methods (actually related to Gaussian quadrature) to name a few. A common approach it to estimate the first few (generally either 2 or 4) moments of the transformation in question and then find a distribution which has those same moments. A family of distributions which can be used to match the first 4 moments is the Pearson system (see Johnson, et al.) See D'Errico & Zaino for a discussion of statistical tolerancing. ---------------------------------------------------------------------- 10. I have data which appears to be a mixture of one or more Gaussian modes. How can I estimate the parameters of this function? For n modes and the i'th data point, the model will generally be of the form Y_i = sum[a_k*P(X_i;mu_k,sigma_k)] + error where the summation index k runs from 1 to n. This problem is a nonlinear regression, because the parameters mu and sigma enter nonlinearly in the gaussian modes P(X;mu,sigma). Resolving these modes can be done with any optimization package, although use of a constrained optimization routine will generally be of value here. Without use of some appropriate constraints the optimization may often diverge or converge to a spurious local minimizer. Some suggestions as to appropriate constraints to use and the rationale for these constraints: a_k >= 0 All Gaussian modes in a mixture should have positive coefficients. sigma_min <= sigma_k <= sigma_max This prevents the individual modes from becoming either spikes or very broad, nearly constant functions. x_min <= mu_k <= x_max The peaks of each of the modes should generally fall within the range of the data. abs(mu_j - mu_k) >= delta Pairs of individual modes which have peaks too near each other are often virtually impossible to resolve unless one is much narrower than the other. Note that if no constraints are applied on the a_k, then multiple linear regression can be used to estimate the values of the a_k given mu_k and sigma_k. This allows a reduction in the size of the parameter space for the nonlinear optimization. Failure of any of these constraints will often suggest that the optimization is heading into an undesirable region of the parameter space. This is often the result of attempting to fit too many Gaussian modes to too little data, poor starting values for the optimization, or very noisy data. Using the constraints listed above will at least make the nonlinear regression much more robust to poor starting values or noisy data. See Gill, Murray, & Wright for a discussion of constrained optimization methods. ---------------------------------------------------------------------- 11. What are some good references on these topics? The references listed here are not necessarily the best possible ones for this subject. 1. Abramowitz & Stegun, "Handbook of Mathematical Functions", Dover Publications, 1965 (originally published by the National Bureau of Standards, 1964) 2. D'Errico & Zaino, "Statistical Tolerancing Using a Modification of Taguchi's Method", Technometrics, Vol 30, no. 4 3. Devroye, "Non-Uniform Random Variate Generation", 4. Gill, Murray, Wright, "Practical Optimization", Academic Press, 1981 5. Hart, et al, "Computer Approximations", Wiley, 1968 6. Johnson, Kotz, & Balakrishnan, "Continuous Univariate Distributions", Volume 1, Wiley, 1994 7. Kalos & Whitlock, "Monte Carlo Methods", Wiley, 1986 8. Knuth, "The Art of Computer Programming, Volume 2, Seminumerical Algorithms", Addison-Wesley, 1969 9. Press, Flannery, Teukolsky, Vetterling, "Numerical Recipes", Cambridge University Press, 1986 ====================================================================== John D'Errico, john.derrico@kodak.com Any errors or omissions are my fault and mine alone. ======================================================================
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html