Entry 18641 (Berkeley CSUA MOTD)

Berkeley CSUA MOTD:Entry 18641

WIKI \| FAQ \| Tech FAQ
`http://csua.com/feed/`

2025/07/12 [General] UID:1000 Activity:popular

7/12

2000/7/11-12 [Computer/SW/Compilers] UID:18641 Activity:very high

7/11    What's a good formal measure of how well a distribution is approximated
        by a normal of the same mean and variance?
        \_ least squares?
        \_ Uh, the third moment, for starters.
           \_ How valid is this as a statistical measurement? I.e. if
              distr. A's skew is higher than distr. B's skew, is A expected
              to deviate from normal more than B on other parameters as well?
              \_ to see how bad this measure is, consider the distribution
                 delta(x) (which is infinity at 0 and 0 everywhere else)
                 and N(0,100) which is a zero mean gaussian with variance
                 100. the former is the narrowest distribution you'll find,
                 the former is very wide. yet their third moments are both 0.
                 a very bad measure indeed. in fact, all 0 mean even functions
                 have the same odd moments. -ali
                        \_ Life was much more pleasurable back in the old
                           days, when all I needed to go smack Rahimi's
                           pompous ass is going upstairs :) -muchandr
                 \_ This isn't applicable since I'm fixing the variance as
                    well. Of course the n+1'th moment is relatively useless
                    if the n'th moments are way off.
                    \_ the point is that if you have an even symmetric
                       zero mean distribution, the third moment will be
                       0, even if your second moments matches. You really
                       need to look at all moments to test for equality.
                       You should really look at mutual information (what
                       emin suggests) if don't have an application in mind
                       and just want a similarity measure with nice properties.
                       \_ noted, thx
        \_ You could also look at the Kullback-Leibler distance which is
           also known as relative entropy.  This has some nice properties.
           For example, if you build a Huffman code for distribution p
           and use it to compress a source with distribution q, the
           expected overhead for using the wrong distribution is D(p||q)
           where D(p||q) is relative entropy.  I don't remember the details
           but I think that D(p||q) also comes up if you are doing
           hypothesis testing where the true distribution is p but you
           think the distribution is q.  If you tell us what application
           you want a difference measure, we might be able to give better
           suggestions.  P.S., you could also using p-norms where
           p = 1,2, or infinity. -emin
           \_ I'm trying to do an approximate analysis of a hideously
              intractable numeric process, and this distribution, which
              is (graphically) "somewhat" normal looking is about half-way
              inside the bigger picture; since it's a sum over a bunch of
              binomial coefficients, I was advised to assume normality
              asymptotically to simplify things, but I want some way of
              getting a feel for how off I am. Yes, generic and not
              well-defined, but you really don't want to hear the gnarly
              details.
              \_ I'm not sure what you mean when you say "it's a sum over
                 a bunch of binomial coeffcients".  However, if you are
                 interested in the random variable S_n = (1/n)*sum(1,n) X_i
                 then the cdf of S_n will converge to a normal distribution
                 with a convergance rate roughly 1/sqrt(n) provided that
                 the random variables X_i are reasonably well behaved.
                 Lookup Central Limit Theorems for the details. -emin
                 \_ Noted; although I have some doubts about the "reasonably
                    well-behaved" part
        \_ one way to do stats is to always optimize the expected value of
           some cost function, and depending on what that cost function is,
           you pick the correct prbabilistic entity to be extremised (for
           example, likelihood or entropy, or some other function).  i think
           the sensible way to answer your question is "what is the cost
           you incur from using the wrong distribution" in terms of the cost

           function you're trying to maximize. i would say mutual information
           is a nice hint, but you really need to watch out for what you
           use the distribution for. -ali
           \_ see above; what i'm looking for to start with is the cost
              function, and it's painfully intractable, both mathematically
              and computationally
              \_ it seems to be that you're saying you can't even SAMPLE from
                 p(x) where p is the correct distrib? if you can sample from
                 the dist, you can do what i'm suggesting.
                 \_ Dude.  If you can sample independently from the
                    distribution, the distribution is your bitch.  You need
                    about twelve samples to approximate it to the degree that
                    is generally needed (source: McKay).
                 \_ The problem is that I'm looking for asymptotic behaviour,
                    and while I can sample at my test sizes at the rate
                    of 5-20 hours per sample, sampling at any decent size
                    can take years per sample
           \_ Aren't you in compilers ali?  Stop studying AI!
        \_ The Kolmogorov-Smirnov test might be useful as well -- used for
           testing if two data sets come from the same parent
           distribution, or whether a data set is consistent with a
           predicted distribution.
           \_ noted, thx

2025/07/12 [General] UID:1000 Activity:popular

7/12

You may also be interested in these entries...

2014/1/14-2/5 [Computer/SW/Languages/C_Cplusplus] UID:54763 Activity:nil

1/14    Why is NULL defined to be "0" in C++ instead of "((void *) 0)" like in
        C?  I have some overloaded functtions where one takes an integer
        parameter and the other a pointer parameter.  When I call it with
        "NULL", the compiler matches it with the integer version instead of
        the pointer version which is a problem.  Other funny effect is that
        sizeof(NULL) is different from sizeof(myPtr).  Thanks.
	...

2013/4/29-5/18 [Computer/SW/Languages/C_Cplusplus, Computer/SW/Compilers] UID:54665 Activity:nil

4/29    Why were C and Java designed to require "break;" statements for a
        "case" section to terminate rather than falling-through to the next
        section?  99% of the time poeple want a "case" section to terminate.
        In fact some compilers issue warning if there is no "break;" statement
        in a "case" section.  Why not just design the languages to have
        termination as the default behavior, and provide a "fallthru;"
	...

2012/12/18-2013/1/24 [Computer/SW/Languages/Perl] UID:54561 Activity:nil

12/18   Happy 25th birthday Perl, and FUCK YOU Larry Wall for fucking up
        the computer science formalism that sets back compilers development
        back for at least a decade:
        http://techcrunch.com/2012/12/18/print-happy-25th-birthday-perl
        \_ I tried to learn Perl but was scared away by it.  Maybe scripting
           lanauages have to be like that in order to work well?
	...

2010/1/12-29 [Computer/SW/Apps/Media] UID:53627 Activity:kinda low

1/12    How do I get a job NOT related to internet DNS social network cloud
        twitter GOOG EC2 amazon API ???
        \_ A CS job not related to API?
        \_ Chip design, or maybe software that does chip design. What is
           your major? How about game developer?
        \_ DNS? DNS? What era ado you live in? I agree that social network
	...

2009/5/6-14 [Computer/SW/Languages/Perl, Computer/SW/Languages/Web] UID:52961 Activity:kinda low

5/6     I'm sure you've seen web sites that distribute software by making
        a user fill out a form and then e-mailing the user a randomly
        generate link to the software that works just once. What software
        is used to do this? I'd like to distribute software in such a way.
        \_ "Software"?  What web server/web application environment
           are you using?
	...

2009/1/13-22 [Computer/Theory] UID:52367 Activity:kinda low

1/13    I am writing a commandline parser for a class and I could use some
        tips for algorithms to use. (The project is over and done so I am
        not cheating, but I am dissatisfied with my end result.) I STFW and
        didn't come up with too much I liked. I read the source for some
        shells like tcsh and that is *WAY* too complicated and relies on
        a lot of other code. I know that browsers and other apps have
	...

2008/5/2-8 [Computer/SW/Compilers] UID:49874 Activity:low

5/2     How do I get the L1/L2 cache size and cache line size on my machine?
        Can I find this stuff out at compile time somehow?
        \_ You aren't planning on running your code on any other processors?
        \_ May I ask what it is you want to achieve ultimately? If you don't
           know your architecture and want to find out dynamically, there are
           tools that can peek/poke to give you definitive answers, plus you get
	...

2007/11/30-12/6 [Computer/SW/Compilers, Computer/HW/CPU] UID:48719 Activity:moderate

11/29   From the CSUA minutes:
        - Next Gen Console
        -- If we have $1800 in our accounts, should we buy a console:
           4 votes passes.
        -- Console voting: 2 votes each, neither passes
           * 360 = 600, more games
	...

2006/11/10-12 [Computer/SW/Compilers] UID:45316 Activity:nil

11/10   Is there anyway to get C/C++ compilers to automatically compile
        different code for different processors?  I'd like to be able to
        say something like:
          #if defined X86 ...
          #elif defined SPARC ...
          #else ...
	...

2006/8/25-28 [Computer/SW/Languages, Computer/SW/Compilers] UID:44149 Activity:nil

8/25    Why are iterators "superior" or more recently popular over the
        traditional method of using  for loops and indexing?
        \_ I guess it's because you can change an array to some other data
           structure (linked-list, tree, ...) without changing the loop code.
           \_ This is a limitation of your language, not the concept of looping
        \_ They handle multithreaded use cases better.
	...

2006/7/11 [Computer/SW/Compilers] UID:43629 Activity:nil

7/11    Is there a way to turn off specific warnings on the intel 9.0 C++
        compilers?  The man page says -wd[warning number] should suppress
        the warning, but that isn't working for me at all.  The only
        think that does is just -w, but that suppresses ALL warnings.
        \_ grep -v warning-that-I-dont-care ...
	...