dev.mysql.com/doc/mysql/en/group-by-functions.html
Unless otherwise stated, group functions ignore NULL values. If you use an group function in a statement containing no GROUP BY clause , it is equivalent to grouping on all rows.
mysql> SELECT student_name, AVG(test_score) -> FROM student -> GROUP BY student_name; The calculation is performed with 64-bit (BIGINT) precision. The calculation is performed with 64-bit (BIGINT) precision. The calculation is performed with 64-bit (BIGINT) precision. This function is available as of MySQL 411 * COUNT(expr) Returns a count of the number of non-NULL values in the rows retrieved by a SELECT statement. COUNT is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values. COUNT is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. This optimization applies only to MyISAM and ISAM tables only, because an exact record count is stored for these table types and can be accessed very quickly. For transactional storage engines (InnoDB, BDB), storing an exact row count is more problematic because multiple transactions may be occurring, each of which may affect the count.
COUNT(DISTINCT) returns 0 if there were no matching rows. In MySQL, you can get the number of distinct expression combinations that don't contain NULL by giving a list of expressions.
If you want to sort values in the result, you should use ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. SEPARATOR is followed by the string value that should be inserted between values of result. You can remove the separator altogether by specifying SEPARATOR ''. You can set a maximum allowed length with the group_concat_max_len system variable.
MIN() and MAX() return NULL if there were no matching rows. mysql> SELECT student_name, MIN(test_score), MAX(test_score) -> FROM student -> GROUP BY student_name; For MIN(), MAX(), and other aggregate functions, MySQL currently compares ENUM and SET columns by their string value rather than by the string's relative position in the set. The STDDEV() form of this function is provided for Oracle compatibility. These functions return NULL if there were no matching rows. STDDEV_POP() returns NULL if there were no matching rows. This function was added in MySQL 503 STDDEV_SAMP() returns NULL if there were no matching rows.
It considers rows as the whole population, not as a sample, so it has the number of rows as the denominator. That is, the denominator is the number of rows minus one. This function was added in MySQL 503 VAR_SAMP() returns NULL if there were no matching rows. This is an extension to standard SQL, available in MySQL 41 or later.
val))) = (COUNT +1)/2 assumes (COUNT+1)/2 returns an int, if not then use INT((COUNT+1)/2) There are variations for the varied defs of median too. See Rozenshtein's book on Optimizing T-SQL Advanced Programming Techniques.
I have utilized this technique with great success on large tables: a multi million row table using the existing summary code took minutes while this SQL trimmed that down to less than 3 seconds! Once you get used to Delta functions, the code is very readable too!
Edit For what it's worth, the SUM() function does not work on TIME values directly, although it appears to work at first glance. length))) This will return another TIME value, suitable for TIME_FORMAT() or whatever else.
Edit A function for doing Base64 encoding would be a good thing to have. This can be used with encryption, but also, binary files can be converted to a convenient universal character format and stored neatly as text.
Edit I haven't tested this much, but I can say for sure that in at least one single-table query using SQL_CALC_FOUND_ROWS/FOUND_ROWS() is MUCH slower than running the query with a LIMIT and then running it again with a COUNT. This particular query is run from PHP and returns 16720 rows with a LIMIT of 25. Running it with two queries (ie, one with LIMIT and the other with COUNT) gives a query time of around 450ms, while running it with SQL_CALC_FOUND_ROWS (LIMIT 25)/FOUND_ROWS() takes slightly over 600ms.
Edi t I agree that the limitations of GROUP BY when used along with ORDER BY are very frustrating. I think the standard way to get around this limitation is with a sub-query, which MySQL only supports in version 41 or greater.
If you sort things the way you need them while populating the temporary table (without GROUP BY), then subsequent queries that join the temporary table will provide suitable results. Thus, the GROUP BY can in a round-about way be used with ORDER BY.
Edit NULL handling seems to be a bit iffy in the HAVING clause, which you might compensate for by using IFNULL: Suppose you have a Users table with column (of unique) Id and a Logins table with Id, LastUse colums such that Id's in Logins may be repeated or might not appear at all.
Edit FINDIND DUPLICATED RECORDS ------------------------------------------------ Well, I don't know toomuch about this but it works for me, and fast enough. SELECT columnname, COUNT(columnname) FROM tablename GROUP BY columnname HAVING COUNT (columnname)>1 Bye community!
Edit To select the latest 5 unique topics, using a datetime field called time, use something like: "select topic, max(time) as m from nuke_stories group by topic order by m desc limit 5;"
Edi t Another take on MEDIAN: The SELECT statement listed in a comment above does not work for all lists of values (especially when the median value is not unique in the list, or when the median should be the average of the two middle-most values in an even-sized list). The resulting medians table will either have 1 or 2 rows. If the size of your original data is an odd number or the two middle-most data values are the same there will only be 1 row. If the size of your original data is an even number and the middle data values are not equal, there will be 2 rows. Take the average to get the true median: SELECT AVG(val) AS median FROM medians; The last step is to drop your temporary table medians: DROP TABLE medians; I couldn't think of another way to compute the MEDIAN without using a temporary table. If you know how to optimize my technique, I would love to hear from you.
Edi t Is it just me that finds the (lacking) description of GROUP BY hard to follow? Good feature though, the ANSI GROUP BY is too redundant, and limits you to 10 GROUP columns, which while high, is still a limit.
Edit Just a suggestion for the developers to add a non-distinct function, whic h is something I am always using a perl script to 'grep out' deletes fro m tables with. Having something like: Delete from some_table where some_column is not distinct. That would leave a single occurance and get rid of the duplicates.
Edit If you want to perform a GROUP BY, but you want values in rows that you a ren't grouping by and don't appear first in the table, you can do so by joining the table to itself. For example, in a table that has the fields "tile_id", "name", "orientati on" and "imagename": 1) I want the fields "tile_id", "name" and "imagename" to be returned. name Because it's joining where the two tables' orientations are not equal AND where the names ARE equal, that automatically finds results in which th ere is more than one. name HAVING thecount > 1 If you aren't picky and don't mind resuts with only one orientation, of c ourse, you can simply remove the HAVING clause from the example above.
Edit I found the above MEDIAN calculations to be far too slow. I don't underst and how it can be efficient since its an O join... Here is a far more efficient method of finding medians (uses temporary ta bles, but some things you can't do in one sql statement). For my particu lar application, I needed to find medians for each group of rows. Here i s a simple example (uses a temp table for the original data, so you can copy-paste it into a mysql term without needing to cleanup anything): ==================================== # setup example data CREATE TEMPORARY TABLE data (group_id INT, value FLOAT); INSERT INTO data VALUES (1,...
|