As posted to the several .stats groups.
=======================Warren Sarle, 29 Dec 1995==========ssm,ssc,sse
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Neural nets and probability (Was Re: comment - ? statistics ...
Message-ID:
Lines: 151
In article <4ba2gr$8t9@sol.sun.csd.unb.ca>,
William Knight writes:
|> Are neural nets properly part of statistics ?? Is there any
|> probability content to the subject ??
(1) Yes and no. (2) Yes. From the comp.ai.neural-nets FAQ:
Q: How are neural networks related to statistical methods?
A: There is considerable overlap between the fields of neural
networks and statistics.
Statistics is concerned with data analysis. In neural network
terminology, statistical inference means learning to generalize from
noisy data. Some neural networks are not concerned with data analysis
(e.g., those intended to model biological systems) and therefore have
little to do with statistics. Some neural networks do not learn (e.g.,
Hopfield nets) and therefore have little to do with statistics. Some
neural networks can learn successfully only from noise-free data (e.g.,
ART or the perceptron rule) and therefore would not be considered
statistical methods. But most neural networks that can learn to
generalize effectively from noisy data are similar or identical to
statistical methods. For example:
* Feedforward nets with no hidden layer (including functional-link
neural nets and higher-order neural nets) are basically
generalized linear models.
* Feedforward nets with one hidden layer are closely related
to projection pursuit regression.
* Probabilistic neural nets are identical to kernel discriminant
analysis.
* General regression neural nets are identical to Nadaraya-Watson
kernel regression.
* Kohonen nets for adaptive vector quantization are very similar
to k-means cluster analysis.
* Hebbian learning is closely related to principal component
analysis.
Some neural network areas that appear to have no close relatives in the
existing statistical literature are:
* Kohonen's self-organizing maps.
* Reinforcement learning (although this is treated in the
operations research literature as Markov decision processes).
* Stopped training (the purpose and effect of stopped training are
similar to shrinkage estimation, but the method is quite different).
Feedforward nets are a subset of the class of nonlinear regression and
discrimination models. Statisticians have studied the properties of this
general class but had not considered the specific case of feedforward
neural nets before such networks were popularized in the neural network
field. Still, many results from the statistical theory of nonlinear
models apply directly to feedforward nets, and the methods that are
commonly used for fitting nonlinear models, such as various
Levenberg-Marquardt and conjugate gradient algorithms, can be used to
train feedforward nets.
While neural nets are often defined in terms of their algorithms or
implementations, statistical methods are usually defined in terms of
their results. The arithmetic mean, for example, can be computed by a
(very simple) backprop net, by applying the usual formula SUM(x_i)/n, or
by various other methods. What you get is still an arithmetic mean
regardless of how you compute it. So a statistician would consider
standard backprop, Quickprop, and Levenberg-Marquardt as different
algorithms for implementing the same statistical model such as a
feedforward net. On the other hand, different training criteria, such as
least squares and cross entropy, are viewed by statisticians as
fundamentally different estimation methods with different statistical
properties.
It is sometimes claimed that neural networks, unlike statistical models,
require no distributional assumptions. In fact, neural networks involve
exactly the same sort of distributional assumptions as statistical
models, but statisticians study the consequences and importance of these
assumptions while most neural networkers ignore them. For example,
least-squares training methods are widely used by statisticians and
neural networkers. Statisticians realize that least-squares training
involves implicit distributional assumptions in that least-squares
estimates have certain optimality properties for noise that is normally
distributed with equal variance for all training cases and that is
independent between different cases. These optimality properties are
consequences of the fact that least-squares estimation is maximum
likelihood under those conditions. Similarly, cross-entropy is maximum
likelihood for noise with a Bernoulli distribution. If you study the
distributional assumptions, then you can recognize and deal with
violations of the assumptions. For example, if you have normally
distributed noise but some training cases have greater noise variance
than others, then you may be able to use weighted least squares instead
of ordinary least squares to obtain more efficient estimates.
References:
Balakrishnan, P.V., Cooper, M.C., Jacob, V.S., and Lewis, P.A. (1994)
"A study of the classification capabilities of neural networks using
unsupervised learning: A comparison with k-means clustering",
Psychometrika, 59, 509-525.
Chatfield, C. (1993), "Neural networks: Forecasting breakthrough or
passing fad", International Journal of Forecasting, 9, 1-3.
Cheng, B. and Titterington, D.M. (1994), "Neural Networks: A Review
from a Statistical Perspective", Statistical Science, 9, 2-54.
Geman, S., Bienenstock, E. and Doursat, R. (1992), "Neural Networks
and the Bias/Variance Dilemma", Neural Computation, 4, 1-58.
Kuan, C.-M. and White, H. (1994), "Artificial Neural Networks: An
Econometric Perspective", Econometric Reviews, 13, 1-91.
Kushner, H. & Clark, D. (1978), _Stochastic Approximation Methods for
Constrained and Unconstrained Systems_, Springer-Verlag.
Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994), _Machine
Learning, Neural and Statistical Classification_, Ellis Horwood.
Ripley, B.D. (1993), "Statistical Aspects of Neural Networks", in
O.E. Barndorff-Nielsen, J.L. Jensen and W.S. Kendall, eds.,
_Networks and Chaos: Statistical and Probabilistic Aspects_,
Chapman & Hall. ISBN 0 412 46530 2.
Ripley, B.D. (1994), "Neural Networks and Related Methods for
Classification," Journal of the Royal Statistical Society, Series B,
56, 409-456.
Sarle, W.S. (1994), "Neural Networks and Statistical
Models," Proceedings of the Nineteenth Annual SAS Users
Group International Conference, Cary, NC: SAS Institute,
pp 1538-1550. (ftp://ftp.sas.com/pub/neural/neural1.ps)
White, H. (1989), "Learning in Artificial Neural Networks: A
Statistical Perspective," Neural Computation, 1, 425-464.
White, H. (1989), "Some Asymptotic Results for Learning in Single
Hidden Layer Feedforward Network Models", J. of the American Statistical
Assoc., 84, 1008-1013.
White, H. (1992), _Artificial Neural Networks: Approximation and
Learning Theory_, Blackwell.
--
Warren S. Sarle SAS Institute Inc. The opinions expressed here
saswss@unx.sas.com SAS Campus Drive are mine and not necessarily
(919) 677-8000 Cary, NC 27513, USA those of SAS Institute.
* * * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html