====================John D'Errico, 24 Mar 1995/July 2001=======ssm
From: John D'Errico
Subject: Gaussian FAQ
Message-ID: <3kv7m3$61f@thetimes.pixel.kodak.com>
[ RFU: JD'E provided an extended and corrected (one confusion of
sigma/sigma^2) version on July 19, 2001. This replaces the original.]
This is a FAQ which attempts to cover questions about the normal
(Gaussian) distribution.
Many thanks are due to those who have suggested enhancements.
I will endeavor to keep this FAQ updated as I receive comments.
Questions to be answered in this FAQ are:
0. Why is the normal (Gaussian) distribution important?
1. What is the normal density function?
2. What is the cumulative normal distribution function?
3. How is ERF related to the cumulative normal distribution function?
4. What is a good approximation to the cumulative normal function?
5. What is a good approximation to the inverse of the cumulative
distribution normal function?
6. How do you generate a normally distributed random variable?
7. If X is a normally distributed random variable with mean = 0
standard deviation = 1, how do you convert it into one with mean = mu
and standard deviation = sigma?
8. How do you generate bivariate or multivariate random normal
variates with a specified correlation matrix? Covariance matrix?
9. If X is a normally distributed random variable, what is the
distribution of f(X)?
10. I have data which appears to be a mixture of one or more Gaussian
modes. How can I estimate the parameters of this function?
11. What are some good references on these topics?
======================================================================
Answers to questions 0-11:
0. Why is the normal (Gaussian) distribution important?
The normal (or Gaussian) distribution is one which appears in an
incredible variety of statistical applications. A good reason
for this is the central limit theorem. This theorem tells us that
sums of random variables will, under the appropriate conditions,
tend to be approximately normally distributed. Even when the right
conditions are not met however, the distributions found for many
experimentally generated sets of data still tend to have a bell
shaped curve that often looks quite like that of a normal.
Even when a distribution may not be truly normal, it may still
be convenient to assume that a normal distribution is a good
approximation. In this case, we can describe the entire distribution
by simply a mean and a variance, two easily computed parameters that
all statisticians and experimentors understand and can use for
comparison. The normal distribution has become a convenient standard
for all to use.
----------------------------------------------------------------------
1. What is the normal density function?
The standard (mean=0 and variance=1) normal density function is
1 -(1/2)*x^2
Z(x) = ---------- e
sqrt(2*pi)
For mean=mu and variance=sigma^2
1 -(1/2)*((x-mu)/sigma)^2
Z(x) = ---------------- e
sigma*sqrt(2*pi)
The normal distribution with mean=mu and variance=sigma^2 is
customarily referred to as N(mu,sigma^2). The standard normal is
N(0,1). Note that some writers use a slighly different convention
for the second parameter, whereas we assume this second parameter
to be the variance (sigma^2), some use this notation to show the
standard deviation (sigma) instead.
----------------------------------------------------------------------
2. What is the cumulative normal distribution function?
The cumulative normal, referred to here by P(X), is simply the
integral of Z(x) dx from minus infinity to X.
We will use Q(X) to refer to the integral of Z(x) dx from X to
plus infinity.
P(X) + Q(X) = 1
----------------------------------------------------------------------
3. How is ERF related to the cumulative normal distribution function?
Since some systems do not provide a cumulative normal function but do
provide ERF, this can be useful if you know the transformation between
them.
ERF(X) = integral [2/sqrt(pi)]*exp(-x^2) dx from 0 to X.
P(X) = [ERF(X/sqrt(2)) + 1]/2
----------------------------------------------------------------------
4. What is a good approximation to the cumulative normal function?
While it is not possible to write a solution to this in a simple,
closed form expression not involving infinite sums, there are many
good approximations available. It is not necessary to resort to
numerical integration techniques such as trapezoidal or Simpson's
rule. In fact, far more accuracy can be achieved through one of the
approximations which can be found in the references listed below.
One simple approximation to P(X), the cumulative normal, is:
(from Abramowitz and Stegun)
Let t = 1/(1+pX)
where p=.33267
then
P(x) = 1-Z(x)*[a_1*t + a_2*t^2 + a_3*t^3]
where
a_1 = 0.4361836
a_2 = -0.1201676
a_3 = 0.9372980
The error in this approximation is less than 0.00001 for all X.
Other approximations can be found in Abramowitz & Stegun; Press, et al;
and in Hart, et al. In particular, Hart has a very extensive set of
approximations, some with accuracy over 20 significant digits within
a given region.
----------------------------------------------------------------------
5. What is a good approximation to the inverse of the cumulative
normal distribution function?
Mathematically we can write this as: Find X such that Q(X) = p for
any 0 < p < 1. (Note: this computes an upper tail probability.)
Again, it is not possible to write this as a closed form expression,
so we resort to approximations. Because of the symmetry of the normal
distribution, we need only consider 0 < p < 0.5. If you have p > 0.5,
then apply the algorithm below to q = 1-p, and then negate the value
for X obtained. (This approximation is also from Abramowitz and
Stegun.)
t = sqrt[ ln(1/p^2) ]
c_0 + c_1*t + c_2*t^2
X = t - ------------------------------
1 + d_1*t + d_2*t^2 + d_3*t^3
c_0 = 2.515517
c_1 = 0.802853
c_2 = 0.010328
d_1 = 1.432788
d_2 = 0.189269
d_3 = 0.001308
See Abramowitz and Stegun; Press, et al.
----------------------------------------------------------------------
6. How do you generate a normally distributed random variable?
There are several good ways to do this, some better than others.
It is assumed that the user has available a good uniform pseudo-
random deviate generator available. This is the starting point
for all such approaches.
A good and logical way to generate a normal random deviate is to
look back to the answer to question (5). Simply generate a single
uniform [0,1] random deviate X, and then invert the cumulative
normal at that point. If the inverse is an accurate one, then that
number will indeed be a pseudo-random normal deviate. This method
will be as fast and as good as is the approximation you choose for
the inverse of the cumulative normal.
The next algorithm we will describe is often used when the exact
normality of the deviates is not deemed important and the user
just wants a quickly programmed, roughly normal deviate. This is
accomplished by making use of the central limit theorem. This
says that the sum of a set of independent, identically distributed
(commonly abbreviated as iid) random numbers will approach
normality as enough of them are added together.
So, simply generate n uniform [0,1] random numbers, and sum them
up. Then shift and scale this sum to have the appropriate mean
and variance to give a roughly N(0,1) random variable. Thus,
sum(X_i) - n/2
Z = --------------
sqrt(n/12)
The most common choices of n are 3 and 12. n=3 requires only a
bit shift instead of a divide to scale it
Z = ( x_1 + x_2 + x_3 )*2 + 3
Of course, n=3 will result in a very poor approximation to a
normal deviate. If we choose n=12, no scaling will be needed
at all, and a histogram would indeed start to look very normal
although the tails will still be inadequate for many users.
Remember though, this method still requires the generation of
12 uniform deviates for every normal one it provides.
The other common method, which generates pairs of normal deviates
at a time (or if you want, bivariate normal deviates) is the Box-
Muller transformation. See Numerical Recipes (Press, et. al.) for
an explanation of this method.
A flaw with the Box-Muller method is an interaction with Linear
Congruential Generators (the most common way to obtain the uniform
inputs) such that the pairs generated always fall on a spiral.
See, for example, Ripley's "Stochastic Simulation" text, or "A Guide
to Simulation" by Bratley, Fox, and Schrage. Because of this
Box-Muller should probably *not* be recommended as a first choice.
----------------------------------------------------------------------
7. If X is a N(0,1) distributed random variable, how do you convert
it into N(mu,sigma^2)?
Referring back to (1), we see that a simple transformation will
accomplish this.
Z = mu + sigma*X
----------------------------------------------------------------------
8. How do you generate multivariate random normal variates with
a specified correlation matrix and set of variances?
Lets start with a simple case, just a bivariate normal. We will
assume s1 and s2 are the standard deviations, c is the desired
correlation between them (-1 <= c <= 1) and mu1 and mu2 are the
desired means.
First generate two unit normal random deviates, N1 and N2.
X = mu1 + s1*N1
Y = mu2 + c*s2*N1 + sqrt(1-c^2)*s2*N2
X and Y will be bivariate random normal deviates with the desired
distribution parameters.
Now for the more general case of a multivariate normal where "multi"
means larger than two.
Suppose we have a correlation matrix C. We need to convert this to
a covariance matrix, S, first.
Let D be a diagonal matrix with the desired variable standard
deviations on its diagonal. Then
S = D C D
Form the Cholesky factorization of S (actually any matrix "square
root" of s will do.) This is a factorization of S in terms of a lower
triangular matrix L such that
S = L' L
This factorization always exists whenever S is positive definite
(non-singular in this case, since we started from a supposedly
valid correlation matrix.)
Generate an n by p array, A, of N(0,1) random deviates, where p is
the number of variables and n is the number of multivariate sample
you wish to end up with. Then
B = A L'
is an array of normally distributed random deviates where each row
is an event with roughly the correlation/covariance matrix you wanted.
(It won't be exactly that because of pseudo-random variability.)
Once you have generated a set of random deviates with the desired
correlation matrix and mean vector zero, simply add a vector offset
to obtain any desired mean.
If the correlation/covariance matrix supplied is singular, a similar
approach can still be used, but now we must define L using a different
type of matrix square root. This can be achieved using an eigenvalue
decomposition of the matrix S.
----------------------------------------------------------------------
9. If X is a normally distributed random variable, what is the
distribution of f(X)?
There are several simple cases we can cover. If X and Y are N(0,1)
independent R.V.s, then
aX + bY is a normal random variable for constants a and b.
X/Y is a Cauchy random variable (and its moments are undefined.)
e^X is a lognormal random variable.
X^2 + Y^2 is a Chi-squared random variable with two degrees of
freedom.
sqrt(x^2 + y^2) is a Rayleigh random variable.
These distributions along with many others can all be found in
Johnson, et al.
The general problem of determining the distribution of a function of
one or more random variables is known as statistical tolerancing.
This can be done by several techniques, Monte Carlo methods, first
or second order Taylor series approximations, and Taguchi style
methods (actually related to Gaussian quadrature) to name a few.
A common approach it to estimate the first few (generally either
2 or 4) moments of the transformation in question and then find a
distribution which has those same moments. A family of distributions
which can be used to match the first 4 moments is the Pearson system
(see Johnson, et al.)
See D'Errico & Zaino for a discussion of statistical tolerancing.
----------------------------------------------------------------------
10. I have data which appears to be a mixture of one or more Gaussian
modes. How can I estimate the parameters of this function?
For n modes and the i'th data point, the model will generally
be of the form
Y_i = sum[a_k*P(X_i;mu_k,sigma_k)] + error
where the summation index k runs from 1 to n. This problem is
a nonlinear regression, because the parameters mu and sigma
enter nonlinearly in the gaussian modes P(X;mu,sigma). Resolving
these modes can be done with any optimization package, although
use of a constrained optimization routine will generally be of
value here. Without use of some appropriate constraints the
optimization may often diverge or converge to a spurious local
minimizer.
Some suggestions as to appropriate constraints to use and the
rationale for these constraints:
a_k >= 0
All Gaussian modes in a mixture should have positive
coefficients.
sigma_min <= sigma_k <= sigma_max
This prevents the individual modes from becoming either
spikes or very broad, nearly constant functions.
x_min <= mu_k <= x_max
The peaks of each of the modes should generally fall within
the range of the data.
abs(mu_j - mu_k) >= delta
Pairs of individual modes which have peaks too near each
other are often virtually impossible to resolve unless one
is much narrower than the other.
Note that if no constraints are applied on the a_k, then multiple
linear regression can be used to estimate the values of the a_k
given mu_k and sigma_k. This allows a reduction in the size of
the parameter space for the nonlinear optimization.
Failure of any of these constraints will often suggest that
the optimization is heading into an undesirable region of the
parameter space. This is often the result of attempting to
fit too many Gaussian modes to too little data, poor starting
values for the optimization, or very noisy data. Using the
constraints listed above will at least make the nonlinear
regression much more robust to poor starting values or noisy
data.
See Gill, Murray, & Wright for a discussion of constrained
optimization methods.
----------------------------------------------------------------------
11. What are some good references on these topics?
The references listed here are not necessarily the best possible ones
for this subject.
1. Abramowitz & Stegun, "Handbook of Mathematical Functions",
Dover Publications, 1965 (originally published by the National
Bureau of Standards, 1964)
2. D'Errico & Zaino, "Statistical Tolerancing Using a Modification
of Taguchi's Method", Technometrics, Vol 30, no. 4
3. Devroye, "Non-Uniform Random Variate Generation",
4. Gill, Murray, Wright, "Practical Optimization", Academic Press,
1981
5. Hart, et al, "Computer Approximations", Wiley, 1968
6. Johnson, Kotz, & Balakrishnan, "Continuous Univariate
Distributions", Volume 1, Wiley, 1994
7. Kalos & Whitlock, "Monte Carlo Methods", Wiley, 1986
8. Knuth, "The Art of Computer Programming, Volume 2, Seminumerical
Algorithms", Addison-Wesley, 1969
9. Press, Flannery, Teukolsky, Vetterling, "Numerical Recipes",
Cambridge University Press, 1986
======================================================================
John D'Errico, john.derrico@kodak.com
Any errors or omissions are my fault and mine alone.
======================================================================
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html