Concatenate and Reshape Dataframes in Pandas
Learn via video courses
Overview
When the data we need to work on comes in multiple files like training data and test data, the need to concatenate it arises. We use the Pandas Concatenate for the same.
Pandas being a great data manipulation tool, also allows us to format the data in any way we want using the Pandas Reshape method.
Scope
In this article, we will go over -
- Concatenating two or more Pandas DataFrames either horizontally or vertically.
- Reshaping dataframes using Pandas Melt.
- Stacking and unstacking in Pandas.
Introduction
A frequent data manipulating task in the domain of data analysis is concatenating two datasets in Pandas. Usually, when we have a lot of data to handle in multiple files, appending it makes it easier to work with it, and by linking multiple attributes together the data in a comprehensible manner, it makes intricate datasets make sense.
Some attributes are useful when working with such huge datasets, while others are not. So grouping and reshaping data benefits in enhancing its structure. We can use it more effectively if the columns are sorted by priority or by their datatypes, for instance.
How To Concatenate Two or More Pandas DataFrames?
Concatenating dataframes are appending or joining the dataframes either vertically or horizontally. For the same, we'll explore the Pandas Concat method. Here, we'll build our dataset to get a clearer understanding.
-
Creating Dataframe to Concatenate Two or More Pandas DataFrames
Let us start off by creating a dataframe in Pandas to demonstrate how to concatenate two dataframes in Pandas.
Output:
-
Concatenate Two or More Pandas DataFrames
Now that our dataframes are ready, let us now concatenate two dataframe using Pandas concat.
-
Concatenating Vertically -
When concatenating dataframes, by default, they are concatenated vertically. We can also specify the axis parameter as zero for the same.
Output:
Here, we can see that the dataframes are concatenated, although some of the values are null, and instead of a [2,6] dataframe, we have a [4,6] dataframe. This is because the defined column names are not the same. Let us now redefine the dataframe to observe the results.
Output:
-
Concatenating Horizontally -
Let us begin by creating a dataframe with similar attributes as the first dataframe. Since this is not by default, we need to specify the axis parameter as 1.
Output:
Here, we can see that we have a resultant dataframe of [3,4] without any null values.
-
Reshaping Pandas Dataframes
Reshaping dataframes are either expanding or reducing the attributes by grouping or categorizing. As a result, the data is clearer, considerably assisting in data analysis.
Time Series Forecasting is the most widely used application for reshaping.
Let us consider another custom dataset for exploring Pandas Melt.
Output:
-
Using Melt
Pandas Melt and Pandas Unmelt are the most popular methods used for reshaping dataframe in Pandas.
To know more about Pandas Melt, feel free to browse through Pandas Melt FAQ.
-
Simplest Melt -
A wide dataframe is transformed into a longer one using Pandas Melt. When a specific column needs to be referenced as an identifier, this function comes in handy.
Output:
-
Displaying custom name -
As the title suggests we can have any name that we require for the columns according to our requirements. We use Pandas Melt to specify parameters for displaying custom titles. "var_name" is the new variable name for the first column, and "value_name" is for changing the name of the second column, which is to be unpivoted.
Output:
-
Displaying multiple IDs -
For keeping columns while using Pandas Melt, we can use multiple IDs for referencing it further.
Output:
Here, "id_vars" is the parameter referencing the column to be used as an identification variable, and "value_vars" is the target column.
-
Specifying columns to melt -
When using Pandas Melt, all the columns other than the defined one are converted to rows. Thus, specifying certain columns allows us to focus only on the ones that are required.
Output:
Here, "col_level" is used to specify multiple columns to be melted, "id_vars" represents the identifier column, and "value vars" is the column to be unpivoted.
-
Pandas melt -
Instead of applying it with the dataframe, Pandas melt can also be called directly from the module.
Output:
-
-
Reshaping by stacking and unstacking
All of the earlier methods widened the dataset. On to this, the Stack method in Pandas compresses the dataset, returning a series or dataframe. Similarly, the unstack method also returns the pivoted index column's value.
Pivoted data can be reshaped based on column values. It builds the axes of the output DataFrame using unique values from the provided index or columns.
The Stack method returns a dataframe with an index and an updated inner level of rows. The unstack method pivots a level of the row index to the column axis resulting in a reconfigured dataframe.
Output:
Here, we see that every column is represented in multiple rows. The returned value is of the type object.
Output:
Here, all the columns are represented as rows. Similar to stack, the return value is that of an object.
Conclusion
- In this article, we discussed techniques for reshaping and concatenating dataframes.
- Concatenate two dataframes. Pandas assist with appending data for much faster data analysis.
- Two dataframes can be concatenated either horizontally or vertically using the concat method.
- Reshaping datasets helps us understand them better, where the data can be expanded or compressed according to will.
- The Pandas Melt and Pandas Unmelt method is used for reshaping the data.
- Another popular method to reshape data is the stack and unstack method. In this case, the data is either expanded by rows or by columns based on the pivoted row or column.