sql中聚合函数和分组函数_学习SQL：聚合函数-白红宇

sql中聚合函数和分组函数_学习SQL：聚合函数

阅读量：2521 次

发布时间：2019-05-11

本文共 10189 字，大约阅读时间需要 33 分钟。

sql中聚合函数和分组函数

SQL has many cool features and aggregate functions are definitely one of these features, actually functions. While they are not specific to SQL, they are used often. They are part of the SELECT statement, and this allows us to have all benefits of SELECT (joining tables, filtering only rows and columns we need), combined with the power of these functions.

SQL具有许多很酷的功能，聚合函数绝对是这些功能之一，实际上是函数。尽管它们不特定于SQL，但经常使用。它们是SELECT语句的一部分，这使我们能够充分利用SELECT的所有好处（联接表，仅过滤所需的行和列），并结合这些功能的强大功能。

该模型 (The Model)

Before we start talking about aggregate functions, we’ll shortly comment on the data model we’ll be using.

在开始讨论聚合函数之前，我们将简短地评论将要使用的数据模型。

This is the same model we’ve been using in a few past articles. I won’t go into details, but rather mention that all 6 tables in the model contain data. Some of the records in tables are referenced in others, while some are not. E.g. we have countries without any related city, and we have cities without any related customers. We’ll comment on this in the article where it will be important.

这与我们在过去的几篇文章中一直使用的模型相同。我不会详细介绍，而是要提到模型中的所有6个表都包含数据。表中的某些记录在其他表中被引用，而另一些则没有。例如，我们有没有任何相关城市的国家，而我们有没有任何相关客户的城市。我们将在文章中对此进行评论，这将是重要的。

最简单的集合函数 (The Simplest Aggregate Function)

We’ll, of course, start with the simplest possible aggregate function. But, before we do it, let’s check the contents of the two tables we’ll use throughout this article. There are tables country and city. We’ll use the following statements:

当然，我们将从最简单的聚合函数开始。但是，在进行此操作之前，让我们检查一下我们将在本文中使用的两个表的内容。有桌子国家和城市。我们将使用以下语句：

SELECT *FROM country; SELECT *FROM city;

You can see the result in the picture below:

您可以在下图中看到结果：

This is nothing new and unexpected. We’ve just listed everything that is in our tables ( “*” in the query will result in returning all columns/attributes, while the lack of any condition/WHERE part of the query will result in returning all rows).

这并不是什么新鲜事和意外的事情。我们刚刚列出了表中的所有内容（查询中的“ *”将导致返回所有列/属性，而查询中缺少任何条件/ WHERE部分将导致返回所有行）。

The only thing I would like to point out is that the country table has 7 rows and that the city table has 6 rows. Now, let’s examine the following queries and their result:

我唯一要指出的是，国家表有7行，城市表有6行。现在，让我们检查以下查询及其结果：

We can notice that for each query we got one row as a result, and the number returned represents the number of rows in each of these two tables. That’s what aggregate function COUNT does. It takes what the query without COUNT would return, and then returns the number of rows in that result. One more important thing you should be aware of is that only COUNT can be used with “*”. All other functions shall require an attribute (or formula) between brackets. We’ll see that later.

我们可以注意到，对于每个查询，结果只有一行，返回的数字表示这两个表中的每一个的行数。这就是COUNT的汇总函数。它使用没有COUNT的查询将返回的内容，然后返回该结果中的行数。您应该意识到的另一件事是，只有COUNT可以与“ *”一起使用。所有其他功能应在方括号之间要求一个属性（或公式）。我们稍后会看到。

汇总函数和联接 (Aggregate Functions & JOINs)

Now let’s try two more things. First, we’ll test how COUNT works when we’re joining tables. To do that, we’ll use the following queries:

现在让我们再尝试两件事。首先，我们将测试联接表时COUNT的工作方式。为此，我们将使用以下查询：

SELECT *FROM countryINNER JOIN city ON city.country_id =  country.id;    SELECT COUNT(*) AS number_of_rowsFROM countryINNER JOIN city ON city.country_id =  country.id;

check the contents of the table and COUNT rows

While the first query is not needed, I’ve used it to show what it will return. I did that because this is what the second query counts. When two tables are joined, you can think of that result as of some intermediate table that can be used as any other tables (e.g. for calculations using aggregate functions, in subqueries).

虽然不需要第一个查询，但我已使用它来显示返回的内容。我这样做是因为第二个查询很重要。当两个表联接在一起时，您可以认为该结果来自可以用作任何其他表的中间表（例如，用于子查询中使用聚合函数的计算）。

Tip: Whenever you’re writing a complex query, you can check what would parts return and that way you’ll be sure your query is working and will be working, as expected.提示：每当编写复杂的查询时，您都可以检查返回哪些部分，这样就可以确保查询能够正常工作，并且可以正常工作。

Also, we should notice, one more thing. We’ve used INNER JOIN while joining tables country and city. This will eliminate countries without any cities from the result (you can check why ). Now we’ll run 3 more queries where tables are joined using LEFT JOIN:

另外，我们应该注意到，还有一件事。我们在连接国家和城市餐桌时使用了INNER JOIN 。这将消除结果中没有任何城市的国家（您可以查看原因）。现在，我们将再运行3个查询，其中使用LEFT JOIN联接表：

SELECT *FROM countryLEFT JOIN city ON city.country_id =  country.id;    SELECT COUNT(*) AS number_of_rowsFROM countryLEFT JOIN city ON city.country_id =  country.id;    SELECT COUNT(country.country_name) AS countries, COUNT(city.city_name) AS citiesFROM countryLEFT JOIN city ON city.country_id =  country.id;

testing the contents of tables and performing simple COUNTs

We can notice a few things:

我们可以注意到一些事情：

^st query returned 8 rows. These are the same 6 rows as in a query using ^第一个查询返回8行。这些与使用INNER JOIN and 2 more rows for countries that don’t have any related city (Russia & Spain) INNER JOIN进行的查询中的第6行相同，对于没有相关城市的国家（俄罗斯和西班牙），则有2行以上

^nd query counts the number of rows 1^第二个查询计算^st query returns, so this number is 8 ^第一个查询返回的行数，因此该数字为8

^rd query has two important things to comment on. The first one is that we’ve used aggregate function (^第三查询有两点要评论。第一个是我们在查询的COUNT), twice in the SELECT部分中使用了聚合函数（ SELECT part of the query. This will usually be the case because you’re interested in more details about the group you want to analyze (number of records, average values, etc.). The second important thing is that these 2 counts used column names instead of “*” and they returned different values. That happens because COUNT ）两次。通常是这种情况，因为您对要分析的组的更多详细信息感兴趣（记录数，平均值等）。第二个重要的事情是这2个计数使用的是列名而不是“ *”，并且它们返回不同的值。发生这种情况是因为COUNT was created that way. If you put column names between brackets COUNT是用这种方式创建的。如果将列名称放在方括号之间，则COUNT will count how many values are there (not including NULL values). All our records had value for country_name, so the 1COUNT将计算其中有多少值（不包括NULL值）。我们的所有记录有用于COUNTRY_NAME值，所以第^st COUNT returned 8. On the other hand, city_name wasn’t defined 2 times (=NULL), so the 2^1个计数恢复8.在另一方面，没有定义CITY_NAME 2倍（= NULL），所以第2 ^nd COUNT returned 6 (8-2=6) ^次返回COUNT 6（8-2 = 6）

Note: This stands for other aggregate functions as well. If they run into NULL values, they will simply ignore them and calculate as they don’t exist.

注意：这也代表其他聚合函数。 如果它们遇到NULL值，它们将简单地忽略它们并进行计算，因为它们不存在。

SQL聚合函数 (SQL Aggregate Functions)

Now it’s time that we mention all T-SQL aggregate functions. The most commonly used are:

现在是时候提到所有T-SQL聚合函数了。最常用的是：

COUNT – counts the number of elements in the group defined COUNT –计算定义的组中元素的数量

SUM – calculates the sum of the given attribute/expression in the group defined SUM –计算给定组中给定属性/表达式的总和

AVG – calculates the average value of the given attribute/expression in the group defined AVG –计算定义组中给定属性/表达式的平均值

MIN – finds the minimum in the group defined MIN –在定义的组中找到最小值

MAX – finds the maximum in the group defined MAX –在定义的组中找到最大值

These 5 are most commonly used and they are standardized so you’ll need them not only in SQL Server but also in other DBMSs. The remaining aggregate functions are:

这5个是最常用的，并且已经标准化，因此您不仅在SQL Server中而且在其他DBMS中都需要它们。其余的聚合函数是：

APPROX_COUNT_DISTINCT APPROX_COUNT_DISTINCT

CHECKSUM_AGG CHECKSUM_AGG

COUNT_BIG COUNT_BIG

GROUPING 分组

GROUPING_ID GROUPING_ID

STDEV STDEV

STDEVP STDEVP

STRING_AGG STRING_AGG

VAR VAR

VARPB VARPB

While all aggregate functions could be used without the GROUP BY clause, the whole point is to use the GROUP BY clause. That clause serves as the place where you’ll define the condition on how to create a group. When the group is created, you’ll calculate aggregated values.

尽管可以在没有GROUP BY子句的情况下使用所有聚合函数，但重点是要使用GROUP BY子句。该子句用作您定义如何创建组的条件的地方。创建组后，您将计算汇总值。

Example: Imagine that you have a list of professional athletes and you know which sport each one of them plays. You could ask yourself something like – From my list, return the minimal, maximal and average height of players, grouped by the sport they play. The result would be, of course, MIN, MAX, and AVG height for groups – “football players”, “basketball players”, etc.

示例： 假设您有一个职业运动员列表，并且知道他们每个人都从事哪种运动。 您可能会问自己类似的问题–从我的列表中，返回球员的最小身高，最大身高和平均身高，并根据他们参加的运动进行分组。 当然，结果将是“足球运动员”，“篮球运动员”等组的MIN，MAX和AVG高度。

集合函数–示例 (Aggregate Functions – Examples)

Now, let’s take a look at how these functions work on a single table. They are rarely used this way, but it’s good to see it, at least for educational purposes:

现在，让我们看一下这些功能如何在单个表上工作。它们很少以这种方式使用，但是很高兴看到它，至少出于教育目的：

The query returned aggregated value for all cities. While these values don’t have any practical use, this shows the power of aggregate functions.

该查询返回所有城市的汇总值。尽管这些值没有实际用途，但这显示了聚合函数的功能。

Now we’ll do something smarter. We’ll use these functions in a way much closer than what you could expect in real-life situations:

现在，我们将做一些更聪明的事情。我们将以比您在现实生活中所期望的方式更接近的方式使用这些功能：

aggregate functions examples using inner join

This is a much “smarter” query than the previous one. It returned the list of all countries, with a number of cities in them, as well as SUM, AVG, MIN, and MAX of their lat values.

与上一个查询相比，这是一个“更智能”的查询。它返回了所有国家/地区的列表，其中包括许多城市，以及其拉特值的SUM，AVG，MIN和MAX。

Please notice that we’ve used the GROUP BY clause. By placing country.id and country. country_name, we’ve defined a group. All cities belonging to the same country will be in the same group. After the group is created, aggregated values are calculated.

请注意，我们已经使用了GROUP BY子句。通过放置country.id和country。 country_name ，我们已经定义了一个组。属于同一国家的所有城市将在同一组中。创建组后，将计算汇总值。

Note: The GROUP BY clause must contain all attributes that are outside aggregate functions (in our case that was country.country_name). You could also include other attributes. We’ve included country.id because we’re sure it uniquely defines each country.

注意： GROUP BY子句必须包含聚合函数之外的所有属性（在本例中为country.country_name）。 您还可以包括其他属性。 我们加入了country.id，因为我们确定它唯一地定义了每个国家。

结论 (Conclusion)

Aggregate functions are a very powerful tool in databases. They serve the same purpose as their equivalents in MS Excel, but the magic is that you can query data and apply functions in the same statement. Today, we’ve seen basic examples. Later in this series, we’ll use them to solve more complicated problems (with more complicated queries), so stay tuned.

聚合函数是数据库中非常强大的工具。它们的作用与其在MS Excel中的等效作用相同，但其神奇之处在于，您可以查询数据并在同一语句中应用函数。今天，我们已经看到了基本示例。在本系列的后面部分，我们将使用它们来解决更复杂的问题（具有更复杂的查询），请继续关注。