Pandas Data Analysis CheatSheet

A perfect cheat sheet for your data analysis in Pandas.

Introduction

Pandas is the most important library for data scientists and Data Analysis. Pandas is a powerful data manipulation library for Python, used for data cleaning, transformation, and analysis. With its intuitive and efficient data structures, Pandas enables users to easily work with and analyze large datasets, making it a go-to tool for data professionals and enthusiasts alike.

In this blog post, I will try to create a cheat sheet for Data analysts and Data Scientists. It will be a handy tool for quick exploration of datasets and analysis.

Importing Pandas

import pandas as pd

Reading Data

data = pd.read_csv('filename.csv') # Reading CSV files
data = pd.read_excel('filename.xlsx') # Reading Excel files

You can also load many other files such as Parquet, arrow, etc.

Descriptive Statistics

data.describe()

Looking at the head and tail of the data

# Show the first 20 rows
data.head()

# Show the last 20 rows

data.tail()

# YOu can also give the rows as an argument
data.head(5) # will return only first 5 rows

Shape and Columns of data

# returns the number of rows and columns
data.shape

# Return the column names
data.columns

# You can create a list from the column names
col_name = list(data.columns)

Indexing and selecting Data

# select a column
data['col_name']
or
data.col_name

# Select multiple columns
data[['col1', 'col2']]

# select col with location
# [[]] return a DataFrame
data.loc[['col1', 'col2']]

# select data using index number
data.iloc[[0]] # first row
data.iloc[[0,1]] # first two row
# you can use slice
data.iloc[:3]

# or you can use lambda function 
# select all the even rows
data.iloc[lambda x: x.index % 2 == 0]

Filtering Data

# selecting row based on condition
data[data['column_name'] > value]

# Selects rows where column value is in a list
data[data['column_name'].isin([value1, value2])] 

# Selects rows where column value contains text
data[data['column_name'].str.contains('text')]

Sorting Data

 # Sorts data by column_name in ascending order
data.sort_values('column_name', ascending=True)

# Sorts data by multiple columns in different order
data.sort_values(['col1', 'col2'], ascending=[True, False])

Applying Functions

# Applies function to column and returns new column
data['new_col'] = data['column_name'].apply(function) 

# Applies function to row and returns new column
data['new_col'] = data.apply(lambda row: function(row['col1'], row['col2']), axis=1)

Reshaping Data

# Creates a pivot table
data.pivot_table(index='col1', columns='col2', values='col3', aggfunc='mean') 

# Melts data from wide to long format
pd.melt(data, id_vars=['col1'], value_vars=['col2', 'col3'])

Cleaning Data

data.dropna() # Removes rows with missing values
data.fillna(value) # Fills missing values with a given value
data.drop_duplicates() # Removes duplicate rows
data.replace(to_replace, value) # Replaces values in the data

Combining Data

pd.concat([data1, data2], axis=0) # Concatenates data vertically
pd.concat([data1, data2], axis=1) # Concatenates data horizontally
pd.merge(data1, data2, on='column_name') # Merges data based on a common column

Quick Visualizing Data

data.plot() # Plots data
data.plot(kind='bar') # Plots a bar chart
data.plot(kind='box') # Plots a box plot
data.plot(kind='hist') # Plots a histogram
data.plot(kind='scatter', x='col1', y='col2') # Plots a scatter plot

Aggregating Data

 # Aggregates data by column_name and returns sum of col1 and mean of col2
data.groupby('column_name').agg({'col1': 'sum', 'col2': 'mean'})

Convert column to DateTime

# Converts column to datetime format
data['date_column'] = pd.to_datetime(data['date_column'])

Saving Data

data.to_csv('filename.csv', index=False) # Saves data to a CSV file
data.to_excel('filename.xlsx', index=False) # Saves data to an Excel file

Conclusion

Pandas is an incredibly versatile and powerful tool for data analysis in Python. With its robust data structures and numerous functions, pandas make it easy to clean, manipulate, and analyze large datasets. By understanding and mastering these techniques, you can unleash the full potential of pandas and become a more effective data analyst. I hope this blog has been informative and valuable in your journey toward mastering data analysis with pandas. Thank you for reading, and I wish you the best of luck in your data analysis endeavors!