Entry 39587 (Berkeley CSUA MOTD)

Berkeley CSUA MOTD:Entry 39587

WIKI \| FAQ \| Tech FAQ
`http://csua.com/feed/`

2025/07/13 [General] UID:1000 Activity:popular

7/13

2005/9/9-11 [Computer/SW/Database, Computer/SW/Languages/Misc] UID:39587 Activity:nil

9/9     How do I do a SELECT STDDEV(column),MEDIAN(column) FROM table? I can
        only find references to AVG. Anyone know? Thanks,
        \_ http://dev.mysql.com/doc/mysql/en/group-by-functions.html
           Somehow I'm just _guessing_ you're using MySQL.

2025/07/13 [General] UID:1000 Activity:popular

7/13

You may also be interested in these entries...

2013/10/28-2014/2/5 [Computer/SW/Database] UID:54751 Activity:nil

10/28   Oracle software to blame for Obamacare website debacles:
        http://www.forbes.com/sites/theapothecary/2013/10/14/obamacares-website-is-crashing-because-it-doesnt-want-you-to-know-health-plans-true-costs
        \_ Larry Ellison is a secret Tea Party supporter.
           Most of this article is bunk, btw. Boy are the Republicans
           getting desperate.
            \_ Umm, no.  Larry Ellison is a not so secret fascist.
	...

2013/4/9-5/18 [Computer/SW/Languages/C_Cplusplus, Computer/SW/Apps, Computer/SW/Languages/Perl] UID:54650 Activity:nil

4/04    Is there a good way to diff 2 files that consist of columns of
        floating point numbers, such that it only tells me if there's a
        difference if the numbers on a given line differ by at least a given
        ratio?  Say, 1%?
        \_ Use Excel.
           1. Open foo.txt in Excel.  It should convert all numbers to cells in
	...

2011/12/29-2012/2/6 [Computer/SW/Database] UID:54274 Activity:nil

12/29   Is it worthwhile to use ext4 on VMs? Is Journaling necessary on VMs?
         \_ what about DBs?  I read somewhere ext3 was better for DB voumes (mysql)
	...

2011/2/14-4/20 [Computer/SW/Unix] UID:54039 Activity:nil

2/14    You sure soda isn't running windows in disguise?  It would explain the
        uptimes.
        \_ hardly, My winbox stays up longer.
        \_ Nobody cares about uptime anymore brother, that's what web2.0 has
           taught us.  Everything is "stateless".
           \_ You;d think gamers would care more about uptime.
	...

2010/11/21-2011/1/13 [Computer/SW/Languages/Web] UID:53988 Activity:moderate

11/21   Lifehacker's recommending Dreamhost as a personal web hosting service.
        Apart from csua, who do you guys use? --erikred
        \_ What do you want to use it for? Do you need CGI or PHP?  My
           brother worked for Dreamhost and said they are unethical. In
           fact, he sued them. This refers to their treatment of customers
           and employees both. I don't know who  or what "Lifehacker" is,
	...

2010/8/10-9/7 [Computer/Companies/Google] UID:53919 Activity:nil

8/10    http://www.businessinsider.com/google-puts-wave-out-of-its-misery-2010-8
        Google Wave No More. The people who worked on it were pretty smart.
        They wrote up a super awesome OKR with extremely low bar as a measure
        of success, exceeded everyone's expectations by going above those
        bars, and got big fat bonuses and promotions as a result of their
        planning. Brilliant.
	...

2010/8/12-9/7 [Computer/SW/Languages/Perl] UID:53922 Activity:nil

8/12    Ruby coders, do you mostly DIY your stuff or use the ruby libs out
        there?   How is their quality compared to other libs you have used
        for other langs?  Thx.
        \_ I use Ruby for hobby stuff, etc.  I use libraries for system stuff
           (web access, process, etc.) but that's about it.  Perl libraries are
           much better/more complete.  I assume because of the maturity and
	...

2012/4/27-6/4 [Computer/SW/Languages/Misc, Computer/SW/Unix] UID:54372 Activity:nil

4/27    I wrote a little shell script to collect iostat data:
        #!/bin/bash
        DATE=`date +%m%d`
        DATADIR=/var/tmp/user
        OUTPUTFILE=$DATADIR/$DATE.out
        while true
	...

2012/5/8-6/4 [Computer/SW/Unix] UID:54383 Activity:nil

5/8     Hello everyone!  This is Josh Hawn, CSUA Tech VP for Spring 2012.
        About 2 weeks ago, someone brought to my attention that our script
        to periodically merge /etc/motd.public into /etc/motd wasn't
        running.  When I looked into it, the cron daemon was running, but
        there hadn't been any root activity in the log since April 7th.  I
        looked into it for a while, but got lost in other things I was
	...

2011/10/26-12/6 [Computer/SW/Unix] UID:54202 Activity:nil

10/24  What's an easy way to see if say column 3 of a file matches a list of
       expressions in a file? Basically I want to combine "grep -f <file>"
       to store the patterns and awk's $3 ~ /(AAA|BBB|CCC)/ ... I realize
       I can do this with "egrep -f " and use regexp instead of strings, but
       was wondering if there was some magic way to do this.
       \_ UNIX has no magic. Make a shell script to produce the ask or egrep
	...

2011/7/30-8/10 [Computer/SW/Languages/Misc] UID:54148 Activity:nil 66%like:54150

7/29    Happy Sysadmin Day
        \_ our "sysadmin" today deleted /home. When we asked her why
           she said she didn't do it. When I checked the sudo logs,
           I found these two commands in order:
               COMMAND=/bin/rm -r /home testuser
               COMMAND=/bin/rm -r /home/testuser
	...

2011/5/19-7/13 [Computer/SW/Languages/Misc] UID:54115 Activity:nil

5/19    If script A runs, and calls script B ..... is it possible for me to exit\
        script A based on results of script B and not continue?
        \_ assume any shell
        \_ Yes.
           \_ without passing the result to some stupid temp file?
              \_ It sounds like you want "scriptb || exit", which will run
	...

2010/12/11-2011/2/19 [Computer/SW/Languages/Perl] UID:53984 Activity:nil

12/11   Anyone have experience with Perl PDF::API2 or PDF::API3?  Can you
        point me to a good tutorial for creating a simple document (a small
        table of 2-3 rows and a single image)?
	...

Cache (8192 bytes)

dev.mysql.com/doc/mysql/en/group-by-functions.htmlUnless otherwise stated, group functions ignore NULL values. If you use an group function in a statement containing no GROUP BY clause , it is equivalent to grouping on all rows. mysql> SELECT student_name, AVG(test_score) -> FROM student -> GROUP BY student_name; The calculation is performed with 64-bit (BIGINT) precision. The calculation is performed with 64-bit (BIGINT) precision. The calculation is performed with 64-bit (BIGINT) precision. This function is available as of MySQL 411 * COUNT(expr) Returns a count of the number of non-NULL values in the rows retrieved by a SELECT statement. COUNT is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values. COUNT is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. This optimization applies only to MyISAM and ISAM tables only, because an exact record count is stored for these table types and can be accessed very quickly. For transactional storage engines (InnoDB, BDB), storing an exact row count is more problematic because multiple transactions may be occurring, each of which may affect the count. COUNT(DISTINCT) returns 0 if there were no matching rows. In MySQL, you can get the number of distinct expression combinations that don't contain NULL by giving a list of expressions. If you want to sort values in the result, you should use ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. SEPARATOR is followed by the string value that should be inserted between values of result. You can remove the separator altogether by specifying SEPARATOR ''. You can set a maximum allowed length with the group_concat_max_len system variable. MIN() and MAX() return NULL if there were no matching rows. mysql> SELECT student_name, MIN(test_score), MAX(test_score) -> FROM student -> GROUP BY student_name; For MIN(), MAX(), and other aggregate functions, MySQL currently compares ENUM and SET columns by their string value rather than by the string's relative position in the set. The STDDEV() form of this function is provided for Oracle compatibility. These functions return NULL if there were no matching rows. STDDEV_POP() returns NULL if there were no matching rows. This function was added in MySQL 503 STDDEV_SAMP() returns NULL if there were no matching rows. It considers rows as the whole population, not as a sample, so it has the number of rows as the denominator. That is, the denominator is the number of rows minus one. This function was added in MySQL 503 VAR_SAMP() returns NULL if there were no matching rows. This is an extension to standard SQL, available in MySQL 41 or later. val))) = (COUNT +1)/2 assumes (COUNT+1)/2 returns an int, if not then use INT((COUNT+1)/2) There are variations for the varied defs of median too. See Rozenshtein's book on Optimizing T-SQL Advanced Programming Techniques. I have utilized this technique with great success on large tables: a multi million row table using the existing summary code took minutes while this SQL trimmed that down to less than 3 seconds! Once you get used to Delta functions, the code is very readable too! Edit For what it's worth, the SUM() function does not work on TIME values directly, although it appears to work at first glance. length))) This will return another TIME value, suitable for TIME_FORMAT() or whatever else. Edit A function for doing Base64 encoding would be a good thing to have. This can be used with encryption, but also, binary files can be converted to a convenient universal character format and stored neatly as text. Edit I haven't tested this much, but I can say for sure that in at least one single-table query using SQL_CALC_FOUND_ROWS/FOUND_ROWS() is MUCH slower than running the query with a LIMIT and then running it again with a COUNT. This particular query is run from PHP and returns 16720 rows with a LIMIT of 25. Running it with two queries (ie, one with LIMIT and the other with COUNT) gives a query time of around 450ms, while running it with SQL_CALC_FOUND_ROWS (LIMIT 25)/FOUND_ROWS() takes slightly over 600ms. Edi t I agree that the limitations of GROUP BY when used along with ORDER BY are very frustrating. I think the standard way to get around this limitation is with a sub-query, which MySQL only supports in version 41 or greater. If you sort things the way you need them while populating the temporary table (without GROUP BY), then subsequent queries that join the temporary table will provide suitable results. Thus, the GROUP BY can in a round-about way be used with ORDER BY. Edit NULL handling seems to be a bit iffy in the HAVING clause, which you might compensate for by using IFNULL: Suppose you have a Users table with column (of unique) Id and a Logins table with Id, LastUse colums such that Id's in Logins may be repeated or might not appear at all. Edit FINDIND DUPLICATED RECORDS ------------------------------------------------ Well, I don't know toomuch about this but it works for me, and fast enough. SELECT columnname, COUNT(columnname) FROM tablename GROUP BY columnname HAVING COUNT (columnname)>1 Bye community! Edit To select the latest 5 unique topics, using a datetime field called time, use something like: "select topic, max(time) as m from nuke_stories group by topic order by m desc limit 5;" Edi t Another take on MEDIAN: The SELECT statement listed in a comment above does not work for all lists of values (especially when the median value is not unique in the list, or when the median should be the average of the two middle-most values in an even-sized list). The resulting medians table will either have 1 or 2 rows. If the size of your original data is an odd number or the two middle-most data values are the same there will only be 1 row. If the size of your original data is an even number and the middle data values are not equal, there will be 2 rows. Take the average to get the true median: SELECT AVG(val) AS median FROM medians; The last step is to drop your temporary table medians: DROP TABLE medians; I couldn't think of another way to compute the MEDIAN without using a temporary table. If you know how to optimize my technique, I would love to hear from you. Edi t Is it just me that finds the (lacking) description of GROUP BY hard to follow? Good feature though, the ANSI GROUP BY is too redundant, and limits you to 10 GROUP columns, which while high, is still a limit. Edit Just a suggestion for the developers to add a non-distinct function, whic h is something I am always using a perl script to 'grep out' deletes fro m tables with. Having something like: Delete from some_table where some_column is not distinct. That would leave a single occurance and get rid of the duplicates. Edit If you want to perform a GROUP BY, but you want values in rows that you a ren't grouping by and don't appear first in the table, you can do so by joining the table to itself. For example, in a table that has the fields "tile_id", "name", "orientati on" and "imagename": 1) I want the fields "tile_id", "name" and "imagename" to be returned. name Because it's joining where the two tables' orientations are not equal AND where the names ARE equal, that automatically finds results in which th ere is more than one. name HAVING thecount > 1 If you aren't picky and don't mind resuts with only one orientation, of c ourse, you can simply remove the HAVING clause from the example above. Edit I found the above MEDIAN calculations to be far too slow. I don't underst and how it can be efficient since its an O join... Here is a far more efficient method of finding medians (uses temporary ta bles, but some things you can't do in one sql statement). For my particu lar application, I needed to find medians for each group of rows. Here i s a simple example (uses a temp table for the original data, so you can copy-paste it into a mysql term without needing to cleanup anything): ==================================== # setup example data CREATE TEMPORARY TABLE data (group_id INT, value FLOAT); INSERT INTO data VALUES (1,...