SQL Self Join
Learn via video course
Overview
Self Join is a method of joining a table with itself. Broadly speaking, we are joining a table with itself, according to some specified condition, to extract some data from it.
What is Self Join in SQL?
As specified by name, self-join allows us to join a table to itself. That is, each row of the table is joined with itself and all other rows depending on some conditions. In other words, we can say that it is a join between two copies of the same table.
Ever wondered how you find out the supervisor of an employee from a given employee table having the fields employee Id, employee Name, supervisor Id. The table on which SELF JOIN is represented below:
employee Id | employee Name | supervisor Id |
---|---|---|
1 | name1 | 2 |
2 | name2 | 3 |
3 | name3 | 3 |
4 | name4 | 2 |
5 | name5 | 3 |
To understand the process of generation of the new table using SELF JOIN, read the article to know more.
Syntax
The syntax of self join depends on 3 different things. These are:
- Select Statement: Statement by which we can select the attributes/column of the table.
- Aliasing: Giving a temporary name to a table
- Condition on which tables are joined together
Let's continue to understand the basic syntax of self-join in SQL.
In the first line, we have used the SELECT statement SELECT Table. Attribute from which is used to select any attribute of the table. After that, we will be giving temporary names to the table by using the command Table as T1. In the end, we will use the condition based on which we are going to join the table where T1.id = T2.id.
Examples
Let us discuss some examples where we can use self-join in SQL.
Example 1: Finding Relationships in a Social Network in SQL
Suppose you have a table called social_network that stores information about a social network of users. The table has the following columns:
- user_id: unique identifier for each user.
- friend_id: the user_id of a user's friend. This is also a user_id from the same table. If a user does not have any friends, then there will be no rows for that user in the table.
You want to find pairs of users who are directly or indirectly connected through their friends in the social network. Here's how you can use self-joins to accomplish this:
In this query, we use a common table expression (CTE) with the WITH RECURSIVE clause to define a recursive query that builds a set of all friends for each user. The initial query selects all pairs of (user_id, friend_id) from the social_network table. The recursive part of the query is performed by joining the friends CTE with the social_network table on the condition that the user's friend_id matches another user's user_id. This joins the friend's friends together, creating a chain of relationships.
The resulting friends CTE will contain all pairs of users who are directly or indirectly connected through their friends in the social network.
Finally, we join the friends CTE with itself (f1 with f2) on the condition that they have a common friend (f1.friend_id = f2.friend_id). We use DISTINCT and WHERE f1.user_id < f2.user_id to eliminate duplicate pairs and ensure each pair is listed only once.
This query will return a table that shows all pairs of users who are directly or indirectly connected through their friends in the social network. This can be useful for analyzing the structure of a social network, identifying potential influencers or cliques, or making friend recommendations to users.
Example 2: Analyzing Hierarchical Data in SQL
Suppose you have a table called hierarchical_data that stores information about a hierarchy of items. The table has the following columns:
- item_id: unique identifier for each item
- parent_id: the ID of the item's parent. This is also an item_id from the same table. The parent_id is NULL if an item does not have a parent, i.e., it is at the root of the hierarchy.
- item_name: name of the item
- item_value: a numerical value associated with the item
You want to analyze the data in this table to compute the sum of the item_value for each node in the hierarchy, including all of its descendants. Here's how you can use self-joins to accomplish this:
In this query, we use a common table expression (CTE) with the WITH RECURSIVE clause to define a recursive query that builds a hierarchy of items. The initial query selects all items that have a NULL parent (i.e., the root of the hierarchy).
The recursive part of the query is performed by joining the hierarchy CTE with the hierarchical_data table on the condition that the item's parent_id matches the item_id of an item in the hierarchy CTE. This joins the parent and all of its descendants together.
The resulting hierarchy CTE will contain all items in the hierarchy along with their item_value. Finally, we perform a group by item_id to compute the total value for each node in the hierarchy.
This query will return a table that shows the item_id and the total item_value for that node and all of its descendants in the hierarchy.
Example 3: Finding Adjacent Cells in a Grid in SQL
Suppose you have a table called grid that stores information about a grid of cells. The table has the following columns:
- cell_id: unique identifier for each cell.
- row_num: the row number of the cell.
- col_num: the column number of the cell.
You want to find all pairs of adjacent cells in the grid, where adjacent cells are defined as cells that share a side (i.e., they are vertically or horizontally adjacent but not diagonally adjacent).
Here's how you can use self-joins to accomplish this:
In this query, grid is joined with itself (g1 with g2) on the condition that the two cells are adjacent. We use a series of OR conditions to check whether g2 is to the left, right, above, or below g1.
If a pair of cells are adjacent, the query returns a row with the cell_id of the first cell (g1.cell_id) and the cell_id of the adjacent cell (g2.cell_id). This query can be useful for finding neighbors in a game or for analyzing spatial relationships in a grid-based system.
Table Aliases in Self Join
Aliases in SQL, if defined simply, is termed as a temporary name given to a table. While referring to the same table more than once in the SQL query, we need to distinguish each table occurrence from the others. For this reason, it is important to use aliases to uniquely identify each reference of the same table in an SQL query.
Let's say we have an EMPLOYEE TABLE on which we have to use the SQL command to perform some of the operations. If we are doing SELF JOIN on EMPLOYEE TABLE then we have to aliase it. This is done in the following way:
After the execution of the above query, we have 2 different tables named E1 & E2. We can access the attributes of these tables by E1.attributeName or E2.attributeName .
Example:
Let's suppose we have a student table that stores the following details:
- Student Id
- Student Name
- Course Id
- Course Name
- Enrolled From
Our main task here is to find all those Students Id's who are enrolled in at least 2 courses. This can be achieved by Self Joining both tables. When we Self Join the Student Table with itself on some condition then it will return all the rows of the data which satisfy the condition. This is a more clear understanding of the Query. Read along to know more.
Table on which we will be implementing Self Join
StudentID | StudentName | CourseID | CourseName | EnrolledFrom |
---|---|---|---|---|
1 | Rahul | 1 | MERN | 2019 |
2 | Shubham | 1 | MERN | 2019 |
3 | Pritish | 1 | MERN | 2019 |
4 | Mehul | 1 | MERN | 2019 |
5 | Yukti | 2 | ML | 2020 |
2 | Shubham | 3 | DSA | 2020 |
1 | Rahul | 3 | DSA | 2020 |
5 | Yukti | 4 | AI | 2021 |
Steps to Do
- Use the Select Statement and select student id from the table.
- Then, aliase both the table because here we are going to self join the table.
- And at the end, just add the condition based on which you are going to select the student id.
Query
We are going to write the query for implementing the above example
THIS LINE OF Query CREATES StudentTable in DATABASE
THIS LINE OF Query ADD DIFFERENT VALUES TO StudentTable
THIS Query WILL GIVE US OUTPUT after SelfJoin
Output:
The above query gives the following output which represents all the Student IDs and Student Names that are enrolled in at least 2 courses.
StudentID | StudentName |
---|---|
1 | Rahul |
2 | Shubham |
5 | Yukti |
Explanation:
First of all, we have created a table with some attributes. After creating the table, we have inserted the data in it. Then we have joined the Student Table with itself by first aliasing them (giving a temporary name to them) after that we have set a condition STUDENT ID SHOULD BE SAME IN BOTH THE TABLE AND COURSE ID SHOULD BE DIFFERENT IN BOTH THE TABLE. On that condition we got 3 rows as an output which is represented in the form of a table.
Conclusion
- The SELF JOIN in SQL is used to join a table to itself as if the table were two tables using table aliases with some specific conditions..
- Aliasing is temporarily renaming at least one table in the SQL statement.
- The syntax of the self join in SQL is as follow: