In this Python Pandas tutorial we are going to talk about Python Pandas Data Manipulation Techniques, so Python is one of the most popular programming languages for data manipulation and analysis, because it has a lot of libraries and also it is easy to use. Among these libraries, Pandas is one of the powerful tool for data manipulation, and it provides different functionalities for handling, cleaning and transforming data. in this tutorial we want to practically talk about this.
The first step in data manipulation is loading data into a Pandas DataFrame. Pandas supports different file formats, including CSV, Excel, SQL databases and many more. Let’s consider an example where we have a CSV file named data.csv, and it contains a dataset. We can load this data using the following code:
1 2 3 |
import pandas as pd data = pd.read_csv('data.csv') |
After that the data is loaded, it is important to get an overview of its structure and contents. Pandas provides several functions to explore the data, such as head(), tail(), info() and describe(). Let’s see these functions in action:
1 2 3 4 |
print(data.head()) # Display the first few rows of the DataFrame print(data.tail()) # Display the last few rows of the DataFrame print(data.info()) # Print the DataFrame's summary information print(data.describe()) # Generate descriptive statistics of the DataFrame |
Filtering data allows us to extract specific rows or columns based on certain conditions. We can use logical operators like ==, !=, >, <, >=, <=, and combine them with Pandas indexing capabilities. For example, let’s say we want to filter the dataset to only include rows where the ‘age’ column is greater than 30:
1 |
filtered_data = data[data['age'] > 30] |
Sorting the data helps in analyzing and visualizing it effectively. Pandas provides the sort_values() function to sort the DataFrame based on one or more columns. Let’s sort the dataset in ascending order based on the age column:
1 |
sorted_data = data.sort_values('age') |
Dealing with missing data is an important part of data manipulation. Pandas provides several methods to handle missing values, such as dropna(), fillna() and interpolate(). Let’s consider an example where we want to drop rows with any missing values:
1 |
cleaned_data = data.dropna() |
Grouping data allows us to perform calculations and aggregations on subsets of data based on specific criteria. The groupby() function in Pandas is used for grouping data. This is an example where we group the data based on the category column and calculate the average value of the price column for each category:
1 |
grouped_data = data.groupby('category')['price'].mean() |
Merging or joining multiple DataFrames is often necessary when working with complex datasets. Pandas provides different functions like concat(), merge() and join() to combine DataFrames. Let’s see an example where we merge two DataFrames based on a common column:
1 |
merged_data = pd.merge(data1, data2, on='common_column') |
This is the complete code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
import pandas as pd # 1. Loading Data data = pd.read_csv('data.csv') #Exploring the Data print(data.head()) # Display the first few rows of the DataFrame print(data.tail()) # Display the last few rows of the DataFrame print(data.info()) # Print the DataFrame's summary information print(data.describe()) # Generate descriptive statistics of the DataFrame #Filtering Data filtered_data = data[data['age'] > 30] #Sorting Data sorted_data = data.sort_values('age') #Handling Missing Data cleaned_data = data.dropna() #Grouping and Aggregating Data grouped_data = data.groupby('category')['price'].mean() |
This is our data.csv
1 2 3 4 5 |
name,age,category,price GeeksCoders,32,A,50.0 Jane Smith,28,B,75.0 Alice Johnson,35,A,60.0 Bob Williams,42,C,80.0 |
This will be the output

More on Plotly