GreyNoise Customer Churn Predictor

Python

Machine Learning

Predictive Modeling

Web App

A Python-based machine learning application deployed with Streamlit to predict customer churn in real-time for GreyNoise Intelligence.

Author

Christopher Hynes

Published

July 18, 2025

Live Application | View the Notebook | GitHub Repository

Project Overview

This project is a Python-based machine learning application designed to predict customer churn at GreyNoise Intelligence. It serves as a modernized, deployable version of an initial analysis I conducted during an internship using R. The primary goal is to provide an interactive tool to assess churn risk in real-time.

To protect proprietary information, the public-facing version of this project, including the live app and the code repository, uses a synthetically generated dataset.

Model Performance

The logistic regression model was validated using a Stratified 5-Fold Cross-Validation strategy. The metrics below reflect the performance on the original, proprietary dataset.

Metric	Score
Mean Accuracy	0.79
Mean ROC AUC	0.70
Mean PR-AUC	0.54

An ROC AUC score of 0.70 demonstrates a good capability to distinguish between customers who will churn and those who will not. The PR-AUC score of 0.54 is particularly important for this imbalanced dataset, indicating that the model is substantially better at identifying churners than a random model.

Key Predictive Factors

Analysis of the model’s coefficients reveals the most significant factors influencing churn predictions.

Top Factors Increasing Churn Risk

Geographic Region: The model identified that customers from certain geographic regions had a significantly higher propensity to churn.
Unknown Account History: When a customer’s prior account signup status was unknown, they were flagged as a high churn risk.
Specific Industry Segments: Customers within certain specialized industries showed a higher tendency to churn.

Top Factors Decreasing Churn Risk

Acquisition Source: Customers who came to GreyNoise via direct traffic were the least likely to churn.
Existing Account History: Knowing a customer had a previous free account was a strong signal of loyalty and a significantly lower churn risk.
Annual Recurring Revenue (ARR): As a customer’s ARR increased, their likelihood of churn decreased substantially.

Tech Stack & Methodology

Modeling: Python 3.11, scikit-learn, pandas, numpy, joblib
Web App: Streamlit
Validation: Stratified K-Fold Cross-Validation
Source Code: The gn_churn.ipynb notebook contains the complete code for data processing and modeling.