• As posted to the several .stats groups. =======================Warren Sarle, 29 Dec 1995==========ssm,ssc,sse From: saswss@hotellng.unx.sas.com (Warren Sarle) Subject: Neural nets and probability (Was Re: comment - ? statistics ... Message-ID: <DKDD58.KL3@unx.sas.com> Lines: 151 In article <4ba2gr$8t9@sol.sun.csd.unb.ca>, William Knight <Knight@unb.ca> writes: |> Are neural nets properly part of statistics ?? Is there any |> probability content to the subject ?? (1) Yes and no. (2) Yes. From the comp.ai.neural-nets FAQ: Q: How are neural networks related to statistical methods? A: There is considerable overlap between the fields of neural networks and statistics. Statistics is concerned with data analysis. In neural network terminology, statistical inference means learning to generalize from noisy data. Some neural networks are not concerned with data analysis (e.g., those intended to model biological systems) and therefore have little to do with statistics. Some neural networks do not learn (e.g., Hopfield nets) and therefore have little to do with statistics. Some neural networks can learn successfully only from noise-free data (e.g., ART or the perceptron rule) and therefore would not be considered statistical methods. But most neural networks that can learn to generalize effectively from noisy data are similar or identical to statistical methods. For example: * Feedforward nets with no hidden layer (including functional-link neural nets and higher-order neural nets) are basically generalized linear models. * Feedforward nets with one hidden layer are closely related to projection pursuit regression. * Probabilistic neural nets are identical to kernel discriminant analysis. * General regression neural nets are identical to Nadaraya-Watson kernel regression. * Kohonen nets for adaptive vector quantization are very similar to k-means cluster analysis. * Hebbian learning is closely related to principal component analysis. Some neural network areas that appear to have no close relatives in the existing statistical literature are: * Kohonen's self-organizing maps. * Reinforcement learning (although this is treated in the operations research literature as Markov decision processes). * Stopped training (the purpose and effect of stopped training are similar to shrinkage estimation, but the method is quite different). Feedforward nets are a subset of the class of nonlinear regression and discrimination models. Statisticians have studied the properties of this general class but had not considered the specific case of feedforward neural nets before such networks were popularized in the neural network field. Still, many results from the statistical theory of nonlinear models apply directly to feedforward nets, and the methods that are commonly used for fitting nonlinear models, such as various Levenberg-Marquardt and conjugate gradient algorithms, can be used to train feedforward nets. While neural nets are often defined in terms of their algorithms or implementations, statistical methods are usually defined in terms of their results. The arithmetic mean, for example, can be computed by a (very simple) backprop net, by applying the usual formula SUM(x_i)/n, or by various other methods. What you get is still an arithmetic mean regardless of how you compute it. So a statistician would consider standard backprop, Quickprop, and Levenberg-Marquardt as different algorithms for implementing the same statistical model such as a feedforward net. On the other hand, different training criteria, such as least squares and cross entropy, are viewed by statisticians as fundamentally different estimation methods with different statistical properties. It is sometimes claimed that neural networks, unlike statistical models, require no distributional assumptions. In fact, neural networks involve exactly the same sort of distributional assumptions as statistical models, but statisticians study the consequences and importance of these assumptions while most neural networkers ignore them. For example, least-squares training methods are widely used by statisticians and neural networkers. Statisticians realize that least-squares training involves implicit distributional assumptions in that least-squares estimates have certain optimality properties for noise that is normally distributed with equal variance for all training cases and that is independent between different cases. These optimality properties are consequences of the fact that least-squares estimation is maximum likelihood under those conditions. Similarly, cross-entropy is maximum likelihood for noise with a Bernoulli distribution. If you study the distributional assumptions, then you can recognize and deal with violations of the assumptions. For example, if you have normally distributed noise but some training cases have greater noise variance than others, then you may be able to use weighted least squares instead of ordinary least squares to obtain more efficient estimates. References: Balakrishnan, P.V., Cooper, M.C., Jacob, V.S., and Lewis, P.A. (1994) "A study of the classification capabilities of neural networks using unsupervised learning: A comparison with k-means clustering", Psychometrika, 59, 509-525. Chatfield, C. (1993), "Neural networks: Forecasting breakthrough or passing fad", International Journal of Forecasting, 9, 1-3. Cheng, B. and Titterington, D.M. (1994), "Neural Networks: A Review from a Statistical Perspective", Statistical Science, 9, 2-54. Geman, S., Bienenstock, E. and Doursat, R. (1992), "Neural Networks and the Bias/Variance Dilemma", Neural Computation, 4, 1-58. Kuan, C.-M. and White, H. (1994), "Artificial Neural Networks: An Econometric Perspective", Econometric Reviews, 13, 1-91. Kushner, H. & Clark, D. (1978), _Stochastic Approximation Methods for Constrained and Unconstrained Systems_, Springer-Verlag. Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994), _Machine Learning, Neural and Statistical Classification_, Ellis Horwood. Ripley, B.D. (1993), "Statistical Aspects of Neural Networks", in O.E. Barndorff-Nielsen, J.L. Jensen and W.S. Kendall, eds., _Networks and Chaos: Statistical and Probabilistic Aspects_, Chapman & Hall. ISBN 0 412 46530 2. Ripley, B.D. (1994), "Neural Networks and Related Methods for Classification," Journal of the Royal Statistical Society, Series B, 56, 409-456. Sarle, W.S. (1994), "Neural Networks and Statistical Models," Proceedings of the Nineteenth Annual SAS Users Group International Conference, Cary, NC: SAS Institute, pp 1538-1550. (ftp://ftp.sas.com/pub/neural/neural1.ps) White, H. (1989), "Learning in Artificial Neural Networks: A Statistical Perspective," Neural Computation, 1, 425-464. White, H. (1989), "Some Asymptotic Results for Learning in Single Hidden Layer Feedforward Network Models", J. of the American Statistical Assoc., 84, 1008-1013. White, H. (1992), _Artificial Neural Networks: Approximation and Learning Theory_, Blackwell. -- Warren S. Sarle SAS Institute Inc. The opinions expressed here saswss@unx.sas.com SAS Campus Drive are mine and not necessarily (919) 677-8000 Cary, NC 27513, USA those of SAS Institute. * * * * * * * * * * * * * * * * * * * * * * * *
  • Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
  • FAQ top.
  • Ulrich home page.
  • Ulrich FAQ. http://www.pitt.edu/~wpilib/stats99.html