Telco Customer Churn Prediction
Python
Machine Learning
Predictive Modeling
Web App
An end-to-end machine learning project that involves cleaning, training, and deploying an interactive web application to predict customer churn on the classic Telco dataset.
Live Application | View the Notebook | GitHub Repository
Project Overview
This project demonstrates a complete machine learning workflow for predicting customer churn for TElco, a telecom company. It involves data cleaning, model training, feature importance analysis, and the deployment of an interactive web application using Streamlit.
By identifying at-risk customers, the business can proactively offer incentives or support to retain them, which is often more cost-effective than acquiring new customers.
Key Features
- Interactive Prediction: Use sliders and dropdowns to input customer data and receive a real-time churn prediction.
- Clear Results: The app displays a clear, understandable prediction of whether a customer is likely to churn.
- Prediction Confidence: Shows the model’s confidence score (probability) for each prediction.
- Optimized Model: Powered by a pruned logistic regression model that has been optimized through feature selection for efficiency and interpretability.
Methodology
The project follows a standard data science workflow:
a. Data Cleaning and Preprocessing
- The
TotalCharges
column was converted to a numeric type, and rows with missing values were dropped. - The non-predictive
customerID
column was removed. - The target variable
Churn
was converted to a binary format (1/0).
b. Model Training and Feature Selection
- A Logistic Regression model was chosen for its effectiveness and high interpretability.
- A
ColumnTransformer
pipeline was used to automatically applyStandardScaler
to numerical features andOneHotEncoder
to categorical features. - Feature Importance Analysis: The model’s coefficients were analyzed, revealing that
gender
,PhoneService
, andPartner
had a negligible impact on predictive power. - Model Pruning: A second, simpler model was trained with these unimportant features removed. This pruned model achieved the same performance (F1-score of 0.61) and was selected for the final application.
Tools and Libraries
- Data Analysis & Modeling: Pandas, Scikit-learn, Jupyter
- Web Application: Streamlit
- Plotting: Matplotlib, Seaborn