Spotify Churn Project

🥅

Goals

Identify which user segments have the highest churn rates across subscription, device, age, and country

Analyze behavioral patterns — listening time, skip rate, and offline usage — between churned and active users

Validate data integrity by cross-checking ads per week against subscription type

Practice building a full EDA workflow around a target variable rather than open-ended exploration

💼

Process

Explored churn rates across all categorical segments — subscription type, country, gender, device, and age group

Analyzed behavioral features including listening time, skip rate, songs played, and offline listening against churn status

Built a correlation heatmap to measure numeric feature relationships with the target variable

Created 17 visualizations across bar charts, box plots, histograms, scatter plots, and a heatmap

✨

Insights

Family plan users churn more than Free users (27.5% vs 24.9%) — paid plans don't guarantee retention

Offline listening is linked to higher retention — users who download music are less likely to leave

The correlation heatmap showed near-zero correlation between all numeric features and churn — a sign the dataset is synthetically randomized rather than reflecting real-world patterns

Sometimes the most valuable insight is knowing what doesn't predict churn — not every dataset tells a clean story

This dataset is synthetic and uniformly randomized — most features show near-identical distributions regardless of churn status. The key analytical skill demonstrated here is recognizing when data lacks predictive signal, rather than forcing conclusions that aren't supported. In a real-world scenario, this would prompt a data quality investigation before any modeling.