Goals
- Identify which user segments have the highest churn rates across subscription, device, age, and country
- Analyze behavioral patterns — listening time, skip rate, and offline usage — between churned and active users
- Validate data integrity by cross-checking ads per week against subscription type
- Practice building a full EDA workflow around a target variable rather than open-ended exploration
Process
- Explored churn rates across all categorical segments — subscription type, country, gender, device, and age group
- Analyzed behavioral features including listening time, skip rate, songs played, and offline listening against churn status
- Built a correlation heatmap to measure numeric feature relationships with the target variable
- Created 17 visualizations across bar charts, box plots, histograms, scatter plots, and a heatmap
Insights
- Family plan users churn more than Free users (27.5% vs 24.9%) — paid plans don't guarantee retention
- Offline listening is linked to higher retention — users who download music are less likely to leave
- The correlation heatmap showed near-zero correlation between all numeric features and churn — a sign the dataset is synthetically randomized rather than reflecting real-world patterns
- Sometimes the most valuable insight is knowing what doesn't predict churn — not every dataset tells a clean story
This dataset is synthetic and uniformly randomized — most features show near-identical distributions regardless of churn status. The key analytical skill demonstrated here is recognizing when data lacks predictive signal, rather than forcing conclusions that aren't supported. In a real-world scenario, this would prompt a data quality investigation before any modeling.