7/11 What's a good formal measure of how well a distribution is approximated
by a normal of the same mean and variance?
\_ least squares?
\_ Uh, the third moment, for starters.
\_ How valid is this as a statistical measurement? I.e. if
distr. A's skew is higher than distr. B's skew, is A expected
to deviate from normal more than B on other parameters as well?
\_ to see how bad this measure is, consider the distribution
delta(x) (which is infinity at 0 and 0 everywhere else)
and N(0,100) which is a zero mean gaussian with variance
100. the former is the narrowest distribution you'll find,
the former is very wide. yet their third moments are both 0.
a very bad measure indeed. in fact, all 0 mean even functions
have the same odd moments. -ali
\_ Life was much more pleasurable back in the old
days, when all I needed to go smack Rahimi's
pompous ass is going upstairs :) -muchandr
\_ This isn't applicable since I'm fixing the variance as
well. Of course the n+1'th moment is relatively useless
if the n'th moments are way off.
\_ the point is that if you have an even symmetric
zero mean distribution, the third moment will be
0, even if your second moments matches. You really
need to look at all moments to test for equality.
You should really look at mutual information (what
emin suggests) if don't have an application in mind
and just want a similarity measure with nice properties.
\_ noted, thx
\_ You could also look at the Kullback-Leibler distance which is
also known as relative entropy. This has some nice properties.
For example, if you build a Huffman code for distribution p
and use it to compress a source with distribution q, the
expected overhead for using the wrong distribution is D(p||q)
where D(p||q) is relative entropy. I don't remember the details
but I think that D(p||q) also comes up if you are doing
hypothesis testing where the true distribution is p but you
think the distribution is q. If you tell us what application
you want a difference measure, we might be able to give better
suggestions. P.S., you could also using p-norms where
p = 1,2, or infinity. -emin
\_ I'm trying to do an approximate analysis of a hideously
intractable numeric process, and this distribution, which
is (graphically) "somewhat" normal looking is about half-way
inside the bigger picture; since it's a sum over a bunch of
binomial coefficients, I was advised to assume normality
asymptotically to simplify things, but I want some way of
getting a feel for how off I am. Yes, generic and not
well-defined, but you really don't want to hear the gnarly
details.
\_ I'm not sure what you mean when you say "it's a sum over
a bunch of binomial coeffcients". However, if you are
interested in the random variable S_n = (1/n)*sum(1,n) X_i
then the cdf of S_n will converge to a normal distribution
with a convergance rate roughly 1/sqrt(n) provided that
the random variables X_i are reasonably well behaved.
Lookup Central Limit Theorems for the details. -emin
\_ Noted; although I have some doubts about the "reasonably
well-behaved" part
\_ one way to do stats is to always optimize the expected value of
some cost function, and depending on what that cost function is,
you pick the correct prbabilistic entity to be extremised (for
example, likelihood or entropy, or some other function). i think
the sensible way to answer your question is "what is the cost
you incur from using the wrong distribution" in terms of the cost
function you're trying to maximize. i would say mutual information
is a nice hint, but you really need to watch out for what you
use the distribution for. -ali
\_ see above; what i'm looking for to start with is the cost
function, and it's painfully intractable, both mathematically
and computationally
\_ it seems to be that you're saying you can't even SAMPLE from
p(x) where p is the correct distrib? if you can sample from
the dist, you can do what i'm suggesting.
\_ Dude. If you can sample independently from the
distribution, the distribution is your bitch. You need
about twelve samples to approximate it to the degree that
is generally needed (source: McKay).
\_ The problem is that I'm looking for asymptotic behaviour,
and while I can sample at my test sizes at the rate
of 5-20 hours per sample, sampling at any decent size
can take years per sample
\_ Aren't you in compilers ali? Stop studying AI!
\_ The Kolmogorov-Smirnov test might be useful as well -- used for
testing if two data sets come from the same parent
distribution, or whether a data set is consistent with a
predicted distribution.
\_ noted, thx |