This project explores customer dataset and performs optimal segmentation using K-Means clustering based on customer age, annual income and spending score.
You can find the code for this project on GitHub.
Data Exploration
The Mall Customer Segmentation dataset was sourced from Kaggle. It consists of five columns: Customer ID, age, gender, annual income, and spending score. Exploratory data analysis (EDA) was conducted to examine the frequency distribution and the relationships between age, annual income, and spending score. Various visualization techniques, including bar charts, swarm plots, violin plots, and pair plots, were employed to extract insights from the data.
Key findings from the analysis included:
- Frequency distribution of age, annual income, and spending score
- Total count of distinct genders
- Swarm and violin plots illustrating the distribution of age, annual income, and spending score by gender
- Pair plot showing the relationship between age, annual income, and spending score by gender
Clustering
Utilized K-means clustering along with the elbow method to determine the optimal number of clusters. The elbow method is a widely used technique for identifying the ideal number of clusters (k) in K-means clustering. This graphical approach is based on the principle that increasing the number of clusters will continually reduce the sum of squared distances between points and their cluster centers (WCSS).
We applied this method to cluster customers in 2D scatter plots using the relevant features of age and spending score, annual income and spending score, and age and spending score. Additionally, we clustered customers based on age, annual income, and spending score and visualized the results in a 3D scatter plot.
For detailed visualizations and further insights, please refer to the README file on my GitHub repository.