- file stat .html ->
FAQ - Chap. 1, basics
*************** Definitions, sources, basics ****************
Data entry
========================Ralph Brands, 26 May 1996=======(spss)
From: brinton@unixg.ubc.ca (Ralph Brands)
Subject: Re: data entry programs
Message-ID:
Having tried many strategies through the years, we've settled on FoxPro.
We bought it only for its dataentry capabilities. You can make input
screens, move fields around, set relations to other screens when you're
entering 750 fields etc very quickly and with minimal reference to
manuals. We're NOT programmers and we can use it. One of the nicest
features is the XBASE default of moving to the next field without having
to hit "entry" or "tab" when the current field width is exceeded. So if
you have a yes/no coded with 1/2, when either of these is entered, the
program moves to the next field (if the field width is 1). This simple
thing is a programming task or impossible in other programs we've used
through the years (Oracle Power Objects, tcl/tk etc).
Cost is $99. Programs are compatible across platforms: you can make a
Windows program on a Mac and vice-versa.
The downside: you are subsidizing Microsoft if you buy it.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Data mining? a text.
(see Stepwise, for generic warnings.)
... Various Web address I have looked for on the subject
of "data mining" have been missing when I looked.
==================Gregory Piatetsky-Shapiro, 1 Mar 1996 ========ssm,cdt
From: gps0@harvey (Gregory Piatetsky-Shapiro)
Newsgroups: sci.stat.math,comp.databases.theory
Subject: New Book: Advances in Knowledge Discovery and Data Mining
Message-ID: <4ia8bc$2f6@ceylon.gte.com>
New Book Announcement:
Advances in Knowledge Discovery and Data Mining
-----------------------------------------------
Edited by Usama M. Fayyad, Gregory Piatetsky-Shapiro,
Padhraic Smyth, and Ramasamy Uthurusamy
Published by the AAAI Press / The MIT Press ISBN 0-262-56097-6
March 1996 625 pp. Price: $ 50.00
This book can be ordered online from The MIT Press: http://mitpress.mit.edu/
More info at: http://www-mitpress.mit.edu/mitp/recent-books/comp/fayap.html
http://www.aaai.org/Publications/Press/Catalog/fayyad.html
(This AAAI website also has abstracts of chapters)
----------------------------------------------------------------------------
"Advances in Knowledge Discovery and Data Mining" brings together the latest
research -- in statistics, databases, machine learning, and artificial
intelligence -- that are part of the exciting and rapidly growing field of
Knowledge Discovery and Data Mining. Topics covered include fundamental
issues, classification and clustering, trend and deviation analysis,
dependency modeling, integrated discovery systems, next generation database
systems, and application case studies. The contributors include leading
researchers and practitioners from academia, government laboratories, and
private industry.
Gregory Piatetsky-Shapiro email: gps@gte.com
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
On-line data sources. Shoenfield.
(See other Stat Web pages, generally.)
=====================Michael Schoenfield, 25 Jan 1996========(spss)
Message-ID: <199601260523.XAA24919@execpc.com>
From: "Michael A. Schoenfield"
Subject: Re: Data Sites on the Net
A number of persons (scientists and other searchers for truth and data :-)
had expressed interest in some of my bookmarks which point to data locations
(at least in theory). Because of the number of requests that I've received,
I though it might be easier to send the bookmarks out to the entire list.
Please feel free to delete if your not interested.
Here is a sampling from my netscape bookmark list:
Education: http://www.census.gov/org/dusd/edu/about.html
European Monthly Monitoring Survey:
http://www.cec.lu/en/comm/dg10/infcom/epo/polls.html
European Public Opinion: http://www.gallup.com/
Gallup Organization: gopher://icpsr.umich.edu/
Neilsen Media: http://www.nielsenmedia.com/
Nielsen Media Research - Interactive Services:
gopher://burrow.cl.msu.edu/11/internet/msu/pda
Political Data Archives - Michigan State:
http://www.princeton.edu/~abelson/index.html
Princeton Survey Research Center:
http://www.ciesin.org/datasets/irss/irss.html
Public Opinion Item Index - The Institute for Research in Social Science:
http://ren.imagis.iupui.edu/pol/
Public Opinion Laboratory: http://www.golan.org.il/polls.html
Public Opinion Polls about the Golan Heights:
http://www.lib.uconn.edu/RoperCenter/
The Roper Center for Public Opinion Research:
gopher://zonnetje.swidoc.nl/11/
Steinmetz Data Archives:
http://politicsusa.com/PoliticsUSA/news/1106ip08.html.cgi
Times Mirror Study -- PoliticsUSA:
http://cansim.epas.utoronto.ca:5680/pwt/pwt.html
PWT Database Welcome Page: http://ssda.anu.edu.au/
Social Science Data Archives: http://www.uark.edu/depts/comminfo/www/data.html-
American Communication Association: gopher://www.polisci.nwu.edu:70/1
American Politics Gopher at North Western Univ.: http://gate1.dda.dk/dda.html
Danish Data Archive: http://dpls.dacc.wisc.edu/
Data and Program Library Service Home Page<:
http://www.swidoc.nl/star/staralg.html
Dutch Social Science Data Archive (Steinmetz):
http://www.keele.ac.uk/depts/po/election.htm
Elections and Electoral Systems by Country: http://www.soc.qc.edu/
General Social Science Survey, CUNY:
gopher://liberty.uc.wlu.edu/11/internet/hytelnet/sites2/ful000/ful021
Hebrew Univ. Social Science Data Archives: http://www.tarki.hu/index-e.html
Hungarian Data Archive: http://icpsr.umich.edu/ICPSR_homepage.html
ICPSR - Home Page: gopher://statlab.stat.yale.edu/11/Internet_Stats
InterNET Resources for Social Science Statistics:
http://cc-server9.massey.ac.nz/%7ENZSRDA/
New Zealand Social Science Research Data Archive: http://www.uib.no/nsd/
Norweign Social Science Data Service: http://www.nau.edu/~srl/
Social Research Lab., N.A.U: http://sosig.esrc.bris.ac.uk/Welcome.html
Social Science Information Gateway: http://www.lib.virginia.edu/socsci/
Social Sciences Data Center: http://www.hsrc.ac.za/sada.html
The South African Data Archive: http://ssdc.ucsd.edu/ssdc/socsci.html
Searchable Catalogs: http://www.stat-usa.gov/stat-usa.textonly.html
Internet: General Data Resources:
http://www.lib.umich.edu/libhome/Documents.center/stats.html
Statistical Resources on the Web<: http://WWW.StatCan.CA/
Statistics Canada - Statistique Canada: http://www.ssd.gu.se/enghome.html
Univ. of Alberta Social Science Data Archives: http://ssdc.ucsd.edu/
Univ. of Calif. - San Diego Data Collection:
gopher://gopher.lib.virginia.edu/11/socsci
Univ. of Virginia Social Science Data Archives:
gopher://statlab.stat.yale.edu/11/SSDA
Yale Social Science Data Archives: http://www.census.gov/stat_abstract/
This is a brief sampling of my bookmarks and I hope that you will find them
useful. Please remember to have fun "truckin" through these sites.
Mike S.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
on Teaching. Ward. (see other WEB sites).
=====================Joe Ward, 14 Mar 1996========sse
Message-ID:
From: Joe H Ward
Subject: Re: Teaching Intro Statistics
F:\ASATORON.94\FINALASA.TXT
**** HANDOUT FOR "ADOPT-A-SCHOOL" SESSION, ASA TORONTO, 1994 ***************
****************************************************************************
EMPOWERING HIGH SCHOOL STUDENTS TO EXPLOIT STATISTICAL MODELS AND SOFTWARE
FOR RESEARCH PROJECTS
Joe H. Ward, Jr, Health Careers High School,
Laura J. Niland, MacArthur High School
Joe H. Ward, Jr., 167 E. Arrowhead Dr, San Antonio, TX 78228
Key Words: Adopt-a-School, Linear Models, Computers
Introduction
Activities of the San Antonio Chapter of ASA involving K-12 students and
teachers are presented. These include (1) the Texas Prefreshman Engineering
Program (PREP) designed to encourage females and minorities to enter science
and engineering careers, (2) Student & Teacher Collaborative Projects in
Problem Solving Using Data Analysis, and (3) Statistics Projects at MacArthur
High School. These experiences are designed to strengthen the statistics and
computer skills (using BUSINESS MYSTAT) of students who are involved in
independent research projects for science fairs and statistics project/poster
contests. A "top-down" approach is used which emphasizes starting with
meaningful research questions and introducing new concepts as the need
arises. The conceptual framework involves the Big Four Ideas of (1)
Prediction, (2) Uncertainty, (3) Modeling, and (4) Optimization. A General
Linear Model approach is used, starting with mutually exclusive categorical
models with least-squares solutions that yield "cell means". Then more
complex models are developed to investigate interactions among variables.
The major goal of the activities described below is to empower high school
students (and their teachers) to make effective use of the combined power of
a prediction model (regression, linear model) approach and computers in data
analysis for practical research. Probability, statistics and computer topics
are introduced when needed. This approach possesses several important
advantages over the traditional sequence in introductory statistics
instruction:
-- Students will have less to learn,
because many of the "standard" statistical analysis
procedures developed before the availability of
high-speed computers can be accomplished with fewer ideas.
-- Students will have more power to
solve new problems, since they will be able to specify
new models for unique problems.
-- Students will be able to solve
more problems with less computational burden, since
the use of statistical software packages allows for solutions
to complex prediction problems.
Background
.....................
============= ABOUT 200 LINES ARE CUT OUT HERE =============
.....................
Selected References
American Association for the Advancement of Science. Science for All
Americans. Washington, D.C.: AAAS, 1989. American Statistical
Association. Guidelines for the Teaching of Statistics K-12 Mathematics
Curriculum. Alexandria, VA: ASA, 1991.
Burrill, G., and J. Burrill. (Eds.). Data analysis and Statistics Across the
Curriculum. Reston, VA: National Council of Teachers of Mathematics,
1991.
Corwin, R., and S.J. Russell. Used Numbers: Real Data in the Classroom.
Palo Alto, CA: Dale Seymour Publications, 1990.
Foerster, Paul A. Precalculus with Trigonometry: Functions and Applications.
Menlo Park, CA: Addison-Wesley, 1986.
Fountain, Robert L. and Joe H. Ward, Jr. Regression Models and Software
Packages: Synthesizing Traditional Procedures in a One-semester
Statistics Course. Presented at ASA Winter Conference at
Louisville, KY, 1992.
Hale, Robert L., and Jeffrey W. Steagall. Business MYSTAT Statistical
Applications (DOS Edition). Cambridge, MA: Course Technology, Inc., 1990.
Laughlin, Margaret A., H. Michael Hartoonian, and Norris M. Sanders. From
Information to Decision Making: New Challenges for Effective Citizenship.
Washington, D.C.: National Council for the Social Studies, 1989.
Moore, David S., and George P. McCabe. Introduction to the Practice of
Statistics, Second Edition, New York, NY: W.H. Freeman, 1993.
(This book and supplementary materials accompany the Telecourse
videotape series Against All Odds: Inside Statistics available from
The Annenberg Project, 1-800-LEARNER. These 26, 30-minute tapes are
excellent and are frequently shown on PBS.)
National Council of Teachers of Mathematics. Curriculum and Evaluation
Standards for School Mathematics. Reston, Va.: NCTM, 1989.
Ward, Joe H., Jr., and Paul A. Foerster. Integrating Statistics into the
Secondary Curriculum. Proceedings of the Third International Conference
on Teaching Statistics. ISI Permanent Office, 428 Princes Beatrixlaan,
PO Box 950, 2270 AZ Voorgburg, The Netherlands, 1991.
Ward, Joe H., Jr., and Earl Jennings. Introduction to Linear Models.
Englewood Cliffs, NJ: Prentice-Hall, 1973.
Ward, Joe H., Jr. Problem Solving Through Data Analysis. San Antonio, TX:
Texas Prefreshman Engineering Program (TexPREP), 1991.
Quantitative Literacy Series
Gnanadesikan, M., R.L. Scheaffer, and J. Swift. The Art and Techniques of
simulation. Palo Alto, CA: Dale Seymour Publications, 1987.
Landwehr, J.M., and A.E. Watkins, Exploring Data., Palo Alto, CA: Dale Seymour
Publications, 1986.
Landwehr, J.M., J. Swift, and A.E. Watkins, Exploring Surveys and Information
from Samples. Palo Alto, CA: Dale Seymour Publications, 1987.
Newman, C.M., T.E. Obremski, and R.L. Scheaffer, Exploring Probability.
Palo Alto, CA: Dale Seymour Publications, 1987.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Dictionaries of terms?
=====================James Ssemakul, 25 Jan 1996========ssc
Message-ID: <960125154430.25830d16@ucrac1.ucr.edu>
From: James Ssemakula
Subject: Re: Encyclopaedia of Statistics Terms
F.H.C. Marriott 1990. A dictionary of statistical terms. Longman & John Wiley.
or Freund's Dictionary/outline of basic statistics
or Tietjen 1986 A topical dictionary of statistics
or Brian Everitt 1995 The Cambridge dictionary of statistics in the medical
sciences.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
... learn about Bayesian theory?
=====================Darren Wilkinson, 07 Sept 1995========ssm,sse
From: D J Wilkinson
Subject: Re: Wanted: introduction to Bayesian probability
Message-ID: <42m827$gvd@mercury.dur.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Klaus-Peter Schriefers (schriefe@x4u2) wrote:
: 1.) What books and articles are recommended to get an overview
: on the state of discussion on Bayesian probability theory?
The book by Bernardo and Smith, "Bayesian Inference" gives an overview of
much of Bayesian statistics, without getting very involved in
philosophical issues. It has an excellent bibliography.
: 2.) What newsgroups, WWW-, HTML- documents are there to the same end?
The Durham statistics guide to stats resources:
http://fourier.dur.ac.uk:8000/stats/other.html
has a slight Bayesian bias. Also, E.T. Jaynes book is on the web:
http://www.math.albany.edu:8008/JaynesBook.html
This is well worth a look if you're a physicist.
: 3.) What applications of Bayesian techniques to physics are known?
See Jaynes book for a few examples.
: 4.) What other schools of thought exist in the field of probability
: theory?
There are the frequentist and likelihood schools (say no more!). There
are then various degrees of Bayesianism. Starting with the
non-informative Bayesians (my choice of words :-) ), who use reference
analysis and non-informative priors, then the max-ent people, then the
genuine subjective Bayesians, such as Savage and Lindley and then the
extreme subjectivists, such as Bruno de Finetti and Michael Goldstein.
: 5.) Since I got contact to subjective probability by a book of
: B.d.Finetti (Probability Theory) who has a special formalism
: and interpretation I would like to know about reactions to
: his approach.
Most "real" Bayesians consider de Finetti's approach to be the most
complete and compelling account of the subjectivists point of view.
However, many practicing Bayesians find his approach too difficult to
carry out in practice, and since they are most familiar with a
probabalistic approach to statistics, prefer to continue using it.
However, the work on Bayes linear methods tries to effectively
operationalize the ideas of de Finetti. See:
http://fourier.dur.ac.uk:8000/stats/bd/
for more details.
--
Darren
Signature page
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
What's a good random-number generator?
==========================Herman Rubin, 26 Sep 1994======ssm,maa,sn
From: hrubin@b.stat.purdue.edu (Herman Rubin)
Newsgroups: sci.stat.math,comp.ai.alife,sci.nonlinear
Subject: Re: Fast random number generator, binomial pdf
Message-ID: <366fd4$jrt@b.stat.purdue.edu>
In article <19940926100937.Patrick.Onghena@po.psy.kuleuven.ac.be>,
Patrick Onghena wrote:
>In Article <361v2v$b2m@carbon.denver.colorado.edu> "jrothman@carbon.denver.colorado.edu (Jay Rothman)" says:
>> Dr A. Kleczkowski (ak133@cus.cam.ac.uk) wrote:
>> : I am looking for a fast (very fast), reliable random number generator,
>> : preferably in C (I am using Borland C++ and standard SUN cc):
>> :
>> :
>> :
>> 1> Simulations run by students in my simulation class using the BC++ RNG
>> have exhibited anomalies due to the RNG (as one would expect from RNGs
>> packaged in compilers) - for whatever reason the RNG in TurboPascal fared
>> better. See Law & Kelton, Simulation Modeling and Analysis, McGraw-Hill,
>> 2nd edition, page 454 for a RNG in C based on the FORTRAN code of Marse
>> And Roberts (1983) that has tested well.
>I also obtained good results with the RNG of Turbo Pascal. Although their
>mixed congruential algorithm (multiplier 134,775,813, increment 1, and
>modulus 2**32) has its weaknesses (like any congruential algorithm), it is
>good enough for most applications and the implementation is fast.
>Onghena, P. (1993). A theoretical and empirical comparison of mainframe,
> microcomputer, and pocket calculator pseudorandom number generators.
> Behavior Research Methods, Instruments, & Computers, 25, 384-395.
All known reasonably fast algorithms are HIGHLY suspect. There are
cryptographically strong procedures, but they are too expensive.
Probably the best for production use, assuming enough memory and that
predictable memory accesses are fast, is the word Tausworthe method.
In this, one sets
x[n] = x[n-j] OP x[n-k],
where OP is full-word integer addition, or XOR, and j and k are appropriately
chosen. For vector processors, both j and k should be large. And exampel
would be to use j=460 and k=607; other Mersenne primes can be used for the
larger one. These have been shown to be congruential generators for huge
bases. It has been suggested that several of these with different Mersenne
primes should be XORed; another possibility is to take physical random numbers,
which need not be outstanding, and XOR them with the pseudo-random ones when
used. But make sure the period of the stored physical random numbers is not
too close to a small multiple of a power of 2.
--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Moments: definitions
=====================Rich Ulrich, 25 Aug 1995========ssm
Subject: Re: Moments
Message-ID: <41ldbc$4av@usenet.srv.cis.pitt.edu>
Jacob Galley (gal2@kimbark.uchicago.edu) wrote:
: This is what I think I know about moments: The first moment of a
: population is the mean; the second moment is the variance; the
: third moment is the skewness; and the fourth moment is the kurtosis.
That is close - the simple series runs, Average of:
X, X^2 (that is read, X-squared), X^3, X^4, X^5, ... etc., going
as high as anybody dreams.
The second CENTRAL moment is what is labeled the variance, i.e., from
(X-mean)^2 . The skewness is 3, an odd power; and beyond that,
geometrical interpretation gets fuzzy with kurtosis at 4, and the
unnamed things that are higher.
One place you might look in the indices of books on statistics is
the `Method of moments' for estimating parameters.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Equations for the ? distribution?
- Numerical Recipes, online. Normal.
===================Lars Gregersen, 23 Oct 1995========ssm
From: projlg@ktbar96.kt.dtu.dk (Lars Gregersen (sbj))
Subject: Re: Looking for src code in Numerical Recipes
Message-ID: <46fnhn$18g@news.uni-c.dk>
Numerical recipes is on the net, try:
http://cfata2.harvard.edu/nr/nrhome.html
The book can be downloaded in Postscript or Acrobat format. From
this it should be possible to extract the source code.
=====================Chuck Haas, 26 Mar 1996========sse
Message-ID:
From: haascn@dunx1.ocs.drexel.edu (chuck haas)
Subject: Re: cdf of multivariate normal
>I am searching for a good and fast algorithm to approximate the cdf of a
>multivariate normal. Does there exist a algorithm that handles a 2-, 3--, 4-
>and more-variate normal distribution? Or are there different algorithms for
>each dimension? I am interested in references, but surely I like programs,
>too.
Some more recent references:
JV Terza and U Welland, "A Comparison of Divariate Normal Algorithms", J
Statis. Comput. Simul. 39:115-27 (1991)
DG Divgi, "Calculation of Univariate and Bivariate Normal Probability
Functions", Annals of Statistics, 7:4:903-10 (1979)
W Albers and WCM Kallenberg, "A Simple Approximation to the Bivariate
Normal Distribution with Large Correlation Coefficient", Journal of
Multivariate Analysis 49:87-96 (1994)
DR Cox and N Wermuth, "A Simple Approximation for Bivariate and Trivariate
Normal Integrals", Int. Stat. Rev. 59:2:263-9 (1991)
Z Drezner and GO Wesolowsky, On the Computation of the Bivariate Normal
Integral, J. Statist. Comput. Simul. 35:101-7 )1990)
I have tried to implement some of these algorithms (in MATLAB) and they are
fairly tricky. Good luck.
---
Charles N. Haas
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Using harmonic/geometric means?
====================Aaron Brown, 15 Aug 1996=======ssm
From: aacbrown@aol.com (AaCBrown)
Message-ID: <4uvaf7$mk8@newsbf02.news.aol.com>
>> Under what conditions should the harmonic and geometric means be used?
The simplest answer for the geometric mean is when multipling data makes
more sense than adding it. For example if I make $50 today and lose $40
tomorrow I made an average of $5 per day. It makes sense to add dollars.
But if my mutual fund makes +50% this year and -40% next year it does not
make sense to add these numbers. I did not get a +10% return over the two
years but a -10% return. In this case the geometric mean of the wealth
ratios (1.5 and 0.6) is a more useful measure than the mean.
Similarly a harmonic mean makes sense when the inverse of the data is the
relevant variable. If I drive 100 kph for half a trip and 25 kph for the
other half my average speed is 62.5 kph (arithmetic mean) if "half" means
half the time but 40 kph (harmonic mean) if "half" means half the
distance.
If there is a big difference between the different means then you likely
have one or more "outliers" that are very different from the other values.
In this case it often makes more sense to report a mean (of either sort)
for the bulk of the data and note the existence of the outliers. There may
be no single number that adequately represents the data.
Aaron C. Brown
New York, NY
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
ML Maximum likelihood vs other statistics
=====================Rich Ulrich, 25 Oct 1995========ssc
From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Re: Was: How to normalize data?
Message-ID: <46m0jk$k9i@usenet.srv.cis.pitt.edu>
Bill Simpson (wsimpson@uwinnipeg.ca) wrote:
: Why would anyone want to do a transformation these days?
: E.g. fitting a simple y=f(x)+error model. Why not just find a reasonable
: model form for f(x) and a reasonable distribution for the error, and then
: fit the model by maximum likelihood?
: It seems to me that transformations were useful in the days before
: easy maximum likelihood computation. Those days are over.
<< original question, revised answer ... >>
Are you asking, "Why do people still use Ordinary Least
Squares (OLS) analyses when maximum likelihood computations
could avoid, say, the need for transformation beforehand?"
Here is how I proceed with problematic data:
I look at the same time for a reasonable transformation, which
is one that makes sense, considering where the data come from,
especially in the regard of leaving a reasonable
distribution for the error. A reason for doing this is that is
should allow for a direct, OLS solution, which gives all
those useful, subsidiary statistics - Means for groups;
correlations among variables. These things only make decent
sense, after my choice of metric (transformations) has gotten
rid of really extreme outliers.
When I know what my metric is, I can look at a scatter-plot (say)
and see that there are still some outliers - so maybe my model is
INVALID if I don't get rid of them. If I just plug my numbers
into the statistical model and pray, then who needs a statistican?
So, what do you have in mind? I do not look forward to "easy
maximum likelihood computation" that replaces the examination
of data.
Where MLE provides a better STATEMENT of the problem, then MLE
solutions should be pursued. But a simple transformation, with
OLS solution, seems to me to be preferable to a direct MLE
solution that merely buries the transformation in unintelligible
statistics (and computerized computations that STILL may take 10
or 100 times as long).
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Why Pearson's test, for contingency table?
======================Rich Ulrich, 04 Jun 1996=======sse
From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Re: Goodness-of-fit statistics
Message-ID: <4p1hra$fqj@usenet.srv.cis.pitt.edu>
Rich Strauss (y8res@ttacs1.ttu.edu) wrote:
Strauss asked about Pearson's chi-square statistic for goodness of fit,
: (1) Is there any rationale, other than historical convenience, for using the
: particular weighting scheme of Pearson's statistic?
Let me recommend to you an article from a few months ago,
"A single general method for the analysis of cross classified data:
Reconciling [...]", Leo Goodman, JASA 96:[my ref. here is mangled.
Next it says 443:408-428]. This discusses not only Pearson's test;
and the maximum likelihood log-linear test which is fairly common;
and also Yule's test, and the whole family.
The family was also mentioned in Agresti's _Categorical data
analysis_ , which cites Cressie and Read for introducing it as
"power divergence" statistics.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
... know where to round-off numbers?
==========================Bob Wheeler, 31 May 1996=======ssc
Subject: Re: rounding error
Message-ID: <31AF6062.531A@echip.com>
> I am working some county population estimates based on state totals and
> have found that I have rounding error. For example, in a county I have
> 326.588 people, and I can't very well have .588 people. If I simply
> delete the .588 people and the other fractions, my state population will
> be off. Does anyone know of any easy methods I could use to adjust the
> numbers without changing the state total.
--
Optimal (or efficient) rounding has been studied, and is used on
problems like yours. I don't have a direct reference, but look in
CIS for papers by Friedrich Pukelshiem. He is concerned with optimal
experimental design, in pursuit of which, he in one of his papers cites
the literature that will interest you. It is possible that you can
find it by searching CIS for "efficient rounding," but this may
be Pukelsheim's coinage.
Bob Wheeler, ECHIP, Inc.
From: knight@unb.ca
Date: Thu, 30 May 1996 15:25:33 GMT
Message-ID:
Unkind question: What is the standard error?
Rounding to 326 or 327 suggests accuracy in the last digit.
If the standard errors be more than 10, is this honest or
should one round to 320 or 330 ?
I.e., should the problem to be, perhaps, rather than how to
round the .588, how to round the * 6*.588 ?
bill knight / university of new brunswick / canada
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Document by Rich Ulrich. E-mail to wpilib+@pitt.edu
FAQ top.
Ulrich home page.
Ulrich FAQ.
http://www.pitt.edu/~wpilib/stats99.html