Entry 13323 (Berkeley CSUA MOTD)

Berkeley CSUA MOTD:Entry 13323

WIKI \| FAQ \| Tech FAQ
`http://csua.com/feed/`

2025/05/24 [General] UID:1000 Activity:popular

5/24

2004/4/21-22 [Computer/Theory] UID:13323 Activity:high

4/21    Cute problem:  You are given n bits which resulted from flipping a
        biased coin n times (you don't know the bias).  The problem is to come
        up with an algorithm which will produce as many unbiased independent
        random bits from this string as possible (on average).  You have no
        other sources of randomness except the input bits.  One possible
        algorithm would be to pair up the bits, and produce a random bit 1 if
        the pair is 01, 0 if the pair is 10, and ignore the pair otherwise.
        This isn't optimal on average, however.  -- ilyas
             \_ assume n is sufficiently large, first calculate the (biased)
                expected value over the n bits. then instead of pairing up
                bits, sample m bits at a time, and if there are more than
                (expected value * m) bits with 1, output 1 or 0 otherwise.
                slide the sample "window" down and continue.
                \_ How do you sample without any additional randomness?
                     -- ilyas
                     \_ a simple example of the sampling logic is to take
                        bits 0~m, then 1~m+1, 2~m+2, etc. this does not
                        require added randomness. a more sophisticated
                        variation would vary the next sample depending
                        on the previous 0 or 1 output (unbiased).
                        \_ The problem is if you use more than $n/m$ windows
                           the resulting bits will not be independent anymore,
                           yes?  -- ilyas
                           \_ yes for the simple logic. what do i win for
                              spelling out a more sophisticated logic?
                              \_ 1 mid sized cupie doll.
                \_ eh?  expected value?
        \_ I am interpreting "on average" here to be, "it doesn't have to
           be perfect, but come up with something good".
           \_ I think "optimal on average" has to mean something like it
              produces more bits (on average) than any other such algorithm
              that also works over the full range of possible biases. I don't
              see how it could be optimal for each possible bias (e.g. to
              be optimal in the case where there is no bias, it would have
              to output n bits for any input, but this would obviously not
              work if there is bias). - Johnny Von Neumann
              \_ With all due respect Mr. von Neumann, if the bias is 1,
                 no algorithm will output any random bits at all.  You simply
                 have no randomness to work with in that case.  To put it
                 another way, the 'optimal' number of bits you should get on
                 average isn't fixed, but depends on the bias.  The metaphor
                 here is 'refining oil'.  The 'tainted' randomness you get as
                 input is like crude oil.  The 'unbiased' randomness you
                 get at the output is like refined oil you get after processing
                 the crude.  The amount of stuff that comes out depends on how
                 much crude you have to work with, and how 'crude' it is.
                   -- ilyas
                 \_ With all due respect, you don't seem to have understood
                    what I wrote. An algorithm can't be optimal for the
                    _unbiased_ case (in which case it would have to have
                    no waste product) and also work in other cases, unless
                    you could distinguish this case in advance.
                    \_ I can see why you would think so, but surprisingly,
                       it can. -- ilyas
                    \_ I think ilyas means "optimal" in the sense that a
                       competing algorithm would produce less random bits
                       over the entire space of possible input strings.  The
                       "optimal" algorithm is the one most successful at
                       producing the most random bits over the entire space of
                       input strings, relative to all other lame algorithms
                       that produce not as many random bits.  Algorithms which
                       produce ANY non-random bits are super-lame and do not
                       qualify for consideration at all.
                       \_ Yes, this is what I think he means, too, and is
                          pretty much what I said above. He seems to be
                          claiming something stronger, which I still think
                          is impossible, but perhaps I have a blind spot. My
                          intuition is that you can do about twice as well
                          as the simple algorithm above.
           \_ Interpret it as "the expected number of random bits produced by
              the algorithm, given the input distribution determined by the
              unknown bias must be as high as possible."  I mean "on average"
              precisely.  It's good to think about why is it that the algorithm
              isn't caring what the input distribution is, essentially.
                -- ilyas
              \_ How about something like this:
                 Suppose you saw k heads out of n flips.
                 Lexicographically order all n-choose-k
                 sequences of n flips that have k heads,
                 find the index i of the actual sequence
                 of observed flips amongst these n-choose-k
                 possibilities, and take the binary expansion
                 of i. This requires some fiddling when
                 n-choose-k isn't a power of two, though ---
                 maybe leaving out the last bit in i in this case.
                 \_ This is one correct algorithm, well done.  This is called
                    a 'codebook based on the method of types', I think.
                    Another algorithm: run the string through gzip.
                    This algorithm and yours work for the same reason.
                    In both cases you get n*H(p) unbiased random bits out,
                    (which is the optimal number),
                    where p is the unknown bias, and H is the entropy.  The
                    reason we don't care about the actual value of p is
                    because the algorithms in question are 'universal source
                    coding algorithms'.  This is why people use gzip on just
                    any old file, and expect it to work well.  Mr. von
                    Neumann's contemporary by name of Claude Shannon worked
                    all this out in a famous paper.  -- ilyas
                 \_ What does "n-choose-k sequences" mean?
                    \_ It's a function often used in combinatorics.  You can
                       google for its definition, and you can ask google to
                       compute it for you (e.g. "10 choose 5"). -- ilyas
        \_ I'd get a better coin.  It sounds like someone shaved yours.

2025/05/24 [General] UID:1000 Activity:popular

5/24

You may also be interested in these entries...

2012/1/24-3/3 [Computer/SW/Languages/C_Cplusplus, Computer/SW/Languages/Misc] UID:54296 Activity:nil

1/24    http://james-iry.blogspot.com/2009/05/brief-incomplete-and-mostly-wrong.html
        Amusing "history" of computer science.
        \_ Where's the mentioning of Al Gore the inventor of AlGorithm?
	...

2011/6/29-7/21 [Computer/SW/Database, Computer/SW] UID:54133 Activity:nil

6/29    "An Israeli algorithm sheds light on the Bible"
        http://www.csua.org/u/tq4 (news.yahoo.com)
        "Software developed by an Israeli team is giving intriguing new hints
        about what researchers believe to be the multiple hands that wrote the
        Bible."
        \_ "Hype developed by an American OnLine News Feed is giving
	...

2010/8/23-9/7 [Computer/Theory] UID:53933 Activity:nil

9/20    Why does everyone talk about Turing but  nobody talks much about
        Babbage?
        \_ arithmetic vs algorithms
	...

2010/8/9-19 [Computer/SW/Security] UID:53917 Activity:nil

8/9     I got two files, one is size 522190848 and the other is size
        521648128.  Both sha256 to the same number.  (and sha1 too).
        I don't think this is supposed to happen, right? (least not with
        sha256).
        \_ how are you checking?
           \_ I burned one file to cd, so i mounted /cdrom and
	...

2010/1/20-29 [Computer/SW/Languages/Misc, Computer/SW/Security] UID:53649 Activity:nil

1/20    Did Chinese come up with new way of quicksort?
        http://www.nytimes.com/2010/01/20/technology/20cyber.html
        Joe Stewart, a malware specialist with SecureWorks, a computer
        security company based in Atlanta, said he determined the main
        program used in the attack contained a module based on an unusu
        al algorithm from a Chinese technical paper that has been
	...

2009/10/24-11/3 [Computer/HW/Laptop] UID:53466 Activity:kinda low

10/24   How well do you see color? I got 8, how about you?
        http://www.xrite.com/custom_page.aspx?PageID=77
        \_ 7
           \_ what monitor did you use?
              \_ LCD on thinkpad x32, under not so great lighting conditions.
        \_ I scored 101, which seems impossible. Then again, I didn't
	...

2009/1/13-22 [Computer/Theory] UID:52367 Activity:kinda low

1/13    I am writing a commandline parser for a class and I could use some
        tips for algorithms to use. (The project is over and done so I am
        not cheating, but I am dissatisfied with my end result.) I STFW and
        didn't come up with too much I liked. I read the source for some
        shells like tcsh and that is *WAY* too complicated and relies on
        a lot of other code. I know that browsers and other apps have
	...

2008/12/2-6 [Computer/SW/Apps, Academia/Berkeley/CSUA/Motd] UID:52140 Activity:kinda low

12/1    Just curious -- what do you guys generally use soda for? Why do you
        log on? Personally, I use it to keep a presence on IRC and AIM/gTalk
        at all times, and mess around with some Python programming (been
        setting up Twisted and such so I can play with making an irc bot).
        --toulouse
        \_ I use it to post SHIT, er, I mean, spill my guts about the company
	...

2008/11/25-28 [Computer/Theory] UID:52108 Activity:nil

11/25   Holy crap! I was the one asking how to open a locked up masterlock a
        few days ago. I tried the algorithm on this webpage:
        http://www.wikihow.com/Crack-a-%22Master-Lock%22-Combination-Lock
        and it worked! Thanks to whoever suggested it.
        \_ Good to know, thanks for the reply.
	...