GreyNoise Customer Churn Predictor

Python
Machine Learning
Predictive Modeling
Web App
A Python-based machine learning application deployed with Streamlit to predict customer churn in real-time for GreyNoise Intelligence.
Author

Christopher Hynes

Published

July 18, 2025

Live Application | View the Notebook | GitHub Repository

Project Overview

This project is a Python-based machine learning application designed to predict customer churn at GreyNoise Intelligence. It serves as a modernized, deployable version of an initial analysis I conducted during an internship using R. The primary goal is to provide an interactive tool to assess churn risk in real-time.

To protect proprietary information, the public-facing version of this project, including the live app and the code repository, uses a synthetically generated dataset.


Model Performance

The logistic regression model was validated using a Stratified 5-Fold Cross-Validation strategy. The metrics below reflect the performance on the original, proprietary dataset.

Metric Score
Mean Accuracy 0.79
Mean ROC AUC 0.70
Mean PR-AUC 0.54

An ROC AUC score of 0.70 demonstrates a good capability to distinguish between customers who will churn and those who will not. The PR-AUC score of 0.54 is particularly important for this imbalanced dataset, indicating that the model is substantially better at identifying churners than a random model.


Key Predictive Factors

Analysis of the model’s coefficients reveals the most significant factors influencing churn predictions.

Top Factors Increasing Churn Risk

  1. Geographic Region: The model identified that customers from certain geographic regions had a significantly higher propensity to churn.
  2. Unknown Account History: When a customer’s prior account signup status was unknown, they were flagged as a high churn risk.
  3. Specific Industry Segments: Customers within certain specialized industries showed a higher tendency to churn.

Top Factors Decreasing Churn Risk

  1. Acquisition Source: Customers who came to GreyNoise via direct traffic were the least likely to churn.
  2. Existing Account History: Knowing a customer had a previous free account was a strong signal of loyalty and a significantly lower churn risk.
  3. Annual Recurring Revenue (ARR): As a customer’s ARR increased, their likelihood of churn decreased substantially.

Tech Stack & Methodology

  • Modeling: Python 3.11, scikit-learn, pandas, numpy, joblib
  • Web App: Streamlit
  • Validation: Stratified K-Fold Cross-Validation
  • Source Code: The gn_churn.ipynb notebook contains the complete code for data processing and modeling.