Skip to content
background-image background-image

Exploring Customer Data with Pandas

This example demonstrates the use of Pandas in Python for analyzing and gaining insights from customer data.

Introduction

Backstory

Imagine you're an analyst at a retail company, and your team has been tasked with improving the customer experience and sales strategy. Your company operates both physical stores and an online platform. To achieve this goal, you decide to analyze customer data using Python's Pandas library to gain insights and make data-driven decisions.

Motivation

The retail company wants to understand customer behavior across different channels and demographics. This will help them tailor their marketing efforts, optimize inventory management, and enhance the overall shopping experience.

Statement

import pandas as pd

# Sample customer data (static data)
data = {
    "CustomerID": [1, 2, 3, 4, 5],
    "Gender": ["Male", "Female", "Male", "Female", "Male"],
    "Age": [28, 45, 22, 38, 29],
    "Channel": ["Online", "Store", "Online", "Store", "Online"],
    "Product": ["Shoes", "Clothing", "Accessories", "Clothing", "Electronics"],
    "PurchaseAmount": [120, 75, 30, 90, 150]
}

# Create a DataFrame from the static data
customer_data = pd.DataFrame(data)

# Explore the first few rows of the dataset
log.info(customer_data.head())

# Calculate average purchase amount by channel
avg_purchase_by_channel = customer_data.groupby("Channel")["PurchaseAmount"].mean()
log.info(f"Average purchase amount by channel: {avg_purchase_by_channel}")

# Calculate total sales by gender
total_sales_by_gender = customer_data.groupby("Gender")["PurchaseAmount"].sum()
log.info(f"Total Sales Distribution by Gender: {total_sales_by_gender}")

# Identify the top 5 most purchased products
top_products = customer_data["Product"].value_counts().head(5)
log.info(f"Top 5 most purchased products: {top_products}")

Explanation

  • Creating a DataFrame: We create a Pandas DataFrame from the sample customer data, allowing us to manipulate and analyze the data efficiently.
  • Exploring Data: We use head() to display the first few rows of the dataset, helping us understand its structure.
  • Calculating Average Purchase Amount by Channel: We group the data by the "Channel" column and calculate the average purchase amount for each channel.
  • Calculating Total Sales by Gender: We group the data by the "Gender" column and calculate the total sales for each gender.
  • Identifying Top Products: We determine the top 5 most purchased products by counting the occurrences of each product.

Conclusion

This Pandas-based analysis of customer data empowers retail companies to make informed decisions and tailor their strategies based on customer behavior. By leveraging Pandas, you can efficiently explore and gain insights from large datasets, enabling data-driven improvements in marketing, inventory management, and overall customer experience.