Berkeley CSUA MOTD 2011/12/23

Berkeley CSUA MOTD:2011:December:23 Friday <Thursday>

WIKI \| FAQ \| Tech FAQ
`http://csua.com/feed/`

2011/12/23-2012/2/6 [Computer/Rants] UID:54271 Activity:nil

12/23   http://venturebeat.com/2011/12/22/uc-berkeley-google-apps
        Oh noes! What Would Bill Gates Do?
        \_ http://lauren.vortex.com/archive/000701.html
           Microsoft to Transition Corporate IT to Google Apps

2011/12/23-2012/2/6 [Computer/SW/Languages/Python] UID:54272 Activity:nil

12/23   In Python, why is it that 'å¥½'=='\xe5\xa5\xbd' but
        u'å¥½'!='\xe5\xa5\xbd' ? I'm really baffled. What
        is the encoding of '\xe5\xa5\xbd'?
        \_ 'å¥½' means '\xe5\xa5\xbd', which is just a string of bytes; it has
           length 3.  Python doesn't know what encoding it's in.  u'å¥½' means
           u'\u597d', which is a string of Unicode characters; it has length 1,
           and Python recognizes it as a single Chinese character.  However,
           it doesn't have any particular encoding!  You have to encode it as
           a byte string before you can output it, and you can choose whatever
           encoding you want.  u'å¥½'.encode('utf-8') returns '\xe5\xa5\xbd'.
           See http://docs.python.org/howto/unicode.html
           \_ wow thanks. I always thought unicode == utf-8, boy I was
              so wrong. This is all very confusing.
              \_ dear dumbass:
                 http://www.stereoplex.com/blog/python-unicode-and-unicodedecodeerror
                 http://docs.python.org/library/codecs.html
                 http://stackoverflow.com/questions/643694/utf-8-vs-unicode
                 \_ If all you've used is UTF-8, you'd have no reason to
                    suspect there are other Unicode encodings (and really,
                    if UTF-8 had been designed first, there probably wouldn't
                    be).  Not knowing about them doesn't make you dumb.

Berkeley CSUA MOTD:2011:December:23 Friday <Thursday>