Aggregate Functions in SQL
Learn via video course
Overview
An aggregate function in SQL performs a calculation on multiple values and returns a single value. SQL provides many aggregate functions that include avg, count, sum, min, max, etc. An aggregate function ignores NULL values when it performs the calculation, except for the count function.
Scope
- This article gives a brief Introduction to Aggregate Functions in SQL.
- Later in the article, we learn what are Aggregate Functions. Different types of Aggregate functions with SQL Syntax and an example query.
What is Aggregate Function in SQL?
Suppose your task is to find the average salary of an employee in some organization provided the salaries of every employee working in that organization. What would you do? The first thing you can think of is adding each one of the salaries and then dividing it by the total number of employees. If the given dataset consists of 10-15 entries, this task seems easy but what would you do with a dataset with thousands of entries?
Adding up to thousands of entries and then dividing it by the total number can be very hectic and prone to error. No worries, SQL comes to the rescue here. SQL provides an aggregate function AVG() which will directly return the average of the numbers over the data you want. This reduces the chances of error as well as saves time.
SQL provides us with some really commendable features and aggregate functions are definitely one of those features.
Aggregate functions in SQL are used to perform operations on multiple rows of a table and return a single value. A single function can be used to compute multiple values and return the required result.
Before we dig deep into SQL aggregate functions, let us take a look at the sample data we’ll be using. The given table named EMP_DATA consists of data concerning 10 employees working in the same organization in different departments.
The table stores a unique EMPLOYEE_ID, Name of the employee, department in which the employee is working, and the salary.
Employee_ID | Name | Department | Salary |
---|---|---|---|
1 | Ram | Marketing | 80000 |
2 | Henry | Production | 76000 |
3 | Disha | R&D | 76000 |
4 | Helen | R&D | 84000 |
5 | Meera | Marketing | 80000 |
6 | Ashish | Production | 64000 |
7 | Bob | Production | 60000 |
8 | Hari | R&D | 60000 |
9 | Preeti | Marketing | NULL |
10 | Mark | Production | 66000 |
Let us dig into each SQL aggregate function with the help of the above table EMP_DATA.
Types of Aggregate Functions in SQL
Aggregate functions in SQL can be of the following types as shown in the figure. We will be understanding the working of these functions one by one.
1. COUNT() Function
The COUNT() aggregate function returns the total number of rows from a database table that matches the defined criteria in the SQL query.
Syntax:
COUNT(*) returns the total number of rows in a given table. COUNT(COULUMN_NAME) returns the total number of non-null values present in the column which is passed as an argument in the function.
Let’s take a look at a few examples to understand the function better.
Example:
Suppose you want to know the total number of employees working in the organization. You can do so by the below-given query.
As COUNT(*) returns the total number of rows and the table named EMP_DATA provided above consists of 10 rows, so the COUNT(*) function returns 10. The output is printed as shown below.
Output:
Note: Except for COUNT(*), all other SQL aggregate functions ignore NULL values.
Suppose you need to count the number of people who are getting a salary. The query given below can help you achieve this.
Here, the Salary column is passed as a parameter to the COUNT() function, and hence, this query returns the number of non NULL values from the column Salary, i.e. 9.
Output:
Suppose you need to count the number of distinct departments present in the organization. The following query can help you achieve this.
The above query returns the total number of distinct non NULL values over the column Department i.e. 3 (Marketing, Production, R&D). The DISTINCT keyword makes sure that only non-repetitive values are counted.
Output:
What if you want to calculate the number of people whose salaries are more than a given amount(say 70,000)? Check out the example below.
The query returns the number of rows where the salary of the employee is greater than or equal to 70,000 i.e 5.
Output:
2. SUM() Function
The SUM() function takes the name of the column as an argument and returns the sum of all the non NULL values in that column. It works only on numeric fields(i.e the columns contain only numeric values). When applied to columns containing both non-numeric(ex - strings) and numeric values, only numeric values are considered. If no numeric values are present, the function returns 0.
Syntax:
The function name is SUM() and the name of the column to be considered is passed as an argument to the function.
Let’s look into some examples to understand the usage better.
Example:
Suppose you need to build a budget for the organization and you need to know the total amount needed to provide salaries to all the employees. To calculate the sum of all the values present in column Salary. You can refer to the below-given example.
The above-mentioned query returns the sum of all non-NULL values over the column Salary i.e 80000 + 76000 + 76000 + 84000 + 80000 + 64000 + 60000 + 60000 + 66000 = 646000
Output:
What if you need to consider only distinct salaries? The following query will help you achieve that.
The DISTINCT keyword makes sure that only non-repetitive values are considered. The query returns the sum of all distinct non NULL values over the column Salary i.e. 80000 + 76000 + 84000 + 64000 + 60000 + 66000 = 430000
Output:
Suppose you need to know the collective salaries for each department(say Marketing). The query given below can help you achieve this.
The query returns the sum of salaries of employees who are working in the Marketing Department i.e 80000 + 80000 = 160000
Output:
Note: There are 3 rows consisting of Marketing as Department value but the third value is a NULL value. Thus, the sum is returned considering only the first two entries having Marketing as Department.
3.AVG() Function
The AVG() aggregate function uses the name of the column as an argument and returns the average of all the non NULL values in that column. It works only on numeric fields(i.e the columns contain only numeric values).
Note: When applied to columns containing both non-numeric (ex - strings) and numeric values, only numeric values are considered. If no numeric values are present, the function returns 0.
Syntax:
The function name is AVG() and the name of the column to be considered is passed as an argument to the function.
Let's take a look at some examples to understand the function better.
Example:
To obtain the average salary of an employee of an organization, the following query can be used.
Here, the column name Salary is passed as an argument and thus the values present in column Salary are considered. The above query returns the average of all non NULL values present in the Salary column of the table.
Average = (80000 + 76000 + 76000 + 84000 + 80000 + 64000 + 60000 + 60000 + 66000 ) / 9 = 646000 / 9 = 71777.77777
Output:
If you need to consider only distinct salaries, the following query will help you out.
The query returns the average of all non NULL distinct values present in the Salary column of the table.
Average = (80000 + 76000 + 84000 + 64000 + 60000 + 66000) / 6 = 430000/ 6 = 71666.66666
Output:
4. MIN() Function
The MIN() function takes the name of the column as an argument and returns the minimum value present in the column. MIN() returns NULL when no row is selected.
Syntax:
The function name is MIN() and the name of the column to be considered is passed as an argument to the function.
To understand this better, let’s take a look at some examples.
Example:
Suppose you want to find out what is the minimum salary that is provided by the organization. The MIN() function can be used here with the column name as an argument.
The query returns the minimum value of all the values present in the mentioned column i.e 60000.
Output:
Suppose you need to know the minimum salary of an employee belonging to the Production department. The following query will help you achieve that.
The query returns the minimum value of all the values present in the mentioned column and has Production as Department value i.e 60000.
Output:
5. MAX() Function
The MAX() function takes the name of the column as an argument and returns the maximum value present in the column. MAX() returns NULL when no row is selected.
Syntax:
The function name is MAX() and the name of the column to be considered is passed as an argument to the function.
To get a better idea of how the function works, let’s look at some examples.
Example:
Suppose you want to find out what is the maximum salary that is provided by the organization. The MAX() function can be used here with the column name as an argument.
The query returns the maximum value of all the values present in the mentioned column i.e 84000.
Output:
Suppose you need to know the maximum salary of an employee belonging to the R&D department. The following query will help you achieve that.
The query returns the maximum value of all the values present in the mentioned column and has R&D as Department value i.e 84000.
Conclusion
- Aggregate functions in SQLs are an exceptionally useful feature in data retrieval from the database. The fact that you can query data and apply functions in the same query is remarkable
- A single-line query can be used to perform operations on multiple records. Using an SQL aggregate function saves time as well as avoids any errors.